Updated on 2025.06.28

Website

You can learn directly from this page

Tracking

Publish Date	Title	Authors	PDF	Code
2025-06-23	Lightweight RGB-T Tracking with Mobile Vision Transformers	Mahdi Falaki et.al.	2506.19154	null
2025-06-18	SOT Enabled 3D Magnetic Field Sensor with Low Offset and High Sensitivity	Sebastian Zeilinger et.al.	2506.15320	null
2025-06-17	Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios	Aswin Shanmugam Subramanian et.al.	2506.14204	null
2025-06-15	Learning Unpaired Image Dehazing with Physics-based Rehazy Generation	Haoyou Deng et.al.	2506.12824	null
2025-06-15	SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition	Yuta Hirano et.al.	2506.12672	null
2025-06-12	Joint ASR and Speaker Role Tagging with Serialized Output Training	Anfeng Xu et.al.	2506.10349	null
2025-06-09	Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition	Asahi Sakuma et.al.	2506.07515	null
2025-06-06	Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models	Yuke Lin et.al.	2506.05796	null
2025-06-03	MVTD: A Benchmark Dataset for Maritime Visual Object Tracking	Ahsan Baidar Bakht et.al.	2506.02866	null
2025-05-28	Nanoscale quantum imaging of field-free deterministic switching of a chiral antiferromagnet	Jingcheng Zhou et.al.	2505.22856	null
2025-05-27	Fully Spiking Neural Networks for Unified Frame-Event Object Tracking	Jingjun Yang et.al.	2505.20834	null
2025-05-28	Progressive Scaling Visual Object Tracking	Jack Hong et.al.	2505.19990	null
2025-05-26	Systems of Twinned Systems: A Systematic Literature Review	Feyi Adesanya et.al.	2505.19916	link
2025-05-26	Comparison of Polar Magnetic Fields Derived from MILOS and MERLIN Inversions with Hinode/SOT-SP Data	Masahito Kubo et.al.	2505.19468	null
2025-05-23	Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking	Cheng-Yen Yang et.al.	2505.18111	null
2025-05-19	Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach	Shiao Wang et.al.	2505.12903	link
2025-05-30	Effect of crystallinity on spin-orbit torque in 5 $\textit{d}$ iridium oxide IrO$_{2}$	Tetsuro Morimoto et.al.	2505.10907	null
2025-05-14	Recent progress on electron- and magnon-mediated torques	Jia-Min Lai et.al.	2505.09257	null
2025-05-14	*Enhanced Spin Pumping and Magnetization dynamics in Ni ${80}$Fe${20}$/MoS$_2$ stack via interface modification*	Mahammad Tahir et.al.	2505.09248	null
2025-05-11	Nonlinear Model Predictive Control for Leaderless UAV Formation Flying with Collision Avoidance under Directed Graphs	Yiming Wang et.al.	2505.06895	null
2025-05-11	Streaming Sliced Optimal Transport	Khai Nguyen et.al.	2505.06835	link
2025-05-10	Nonlinearity Modulation of Auto-oscillations in Three-terminal Magnetic Tunnel Junctions	Zixi Wang et.al.	2505.06547	null
2025-05-06	Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation	Gabriele Rosi et.al.	2505.06280	link
2025-05-09	CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking	Weihong Li et.al.	2505.05936	link
2025-05-08	A Simple Detector with Frame Dynamics is a Strong Tracker	Chenxu Peng et.al.	2505.04917	link
2025-05-06	Modality-Guided Dynamic Graph Fusion and Temporal Diffusion for Self-Supervised RGB-T Tracking	Shenglan Li et.al.	2505.03507	link
2025-05-02	Current-induced Dynamics of Bloch Domain-wall Bimerons	Jiwen Chen et.al.	2505.00959	null
2025-05-01	A High-resolution, Inversion-Based Synoptic Study of Solar Granulation	James Crowley et.al.	2505.00826	null
2025-05-01	DARTer: Dynamic Adaptive Representation Tracker for Nighttime UAV Tracking	Xuzhao Li et.al.	2505.00752	null
2025-04-24	RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network	Boyue Xu et.al.	2504.17595	null
2025-04-22	SonarT165: A Large-scale Benchmark and STFTrack Framework for Acoustic Object Tracking	Yunfeng Li et.al.	2504.15609	link
2025-04-19	Adversarial Attack for RGB-Event based Visual Object Tracking	Qiang Chen et.al.	2504.14423	link
2025-04-28	HyDra: SOT-CAM Based Vector Symbolic Macro for Hyperdimensional Computing	Md Mizanur Rahaman Nayan et.al.	2504.14020	null
2025-04-18	FocusTrack: A Self-Adaptive Local Sampling Algorithm for Efficient Anti-UAV Tracking	Ying Wang et.al.	2504.13604	link
2025-04-17	TAXI: Traveling Salesman Problem Accelerator with X-bar-based Ising Macros Powered by SOT-MRAMs and Hierarchical Clustering	Sangmin Yoo et.al.	2504.13294	null
2025-04-16	Efficient spin-orbit torque driven magnetization switching of GdFe using phosphorus-implanted platinum layers	Kazuki Shintaku et.al.	2504.11796	null
2025-04-15	Chiral Domain Walls Induced by Radially Magnetized Nanotube Geometry	Nobuyuki Umetsu et.al.	2504.11005	null
2025-04-16	Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution	Chenghao Li et.al.	2504.09566	link
2025-04-13	Sub-nanosecond in-plane magnetization switching induced by field-like spin-orbit torques from ferromagnets	Hanying Zhang et.al.	2504.09431	null
2025-04-12	Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking	You Wu et.al.	2504.09228	link
2025-04-11	Bayesian Reasoning Enabled by Spin-Orbit Torque Magnetic Tunnel Junctions	Yingqian Xu et.al.	2504.08257	null
2025-04-08	Magnetic Memory Driven by Orbital Current	Jingkai Xu et.al.	2504.05780	null
2025-04-07	Dimensionality Enhanced Out-of-Plane Spin Currents in NbIrTe $_4$ for Efficient Field-Free Switching of Perpendicular Magnetization	Wei Yang et.al.	2504.05280	null
2025-04-02	Shape Anisotropy Enabled Field Free Switching of Perpendicular Nanomagnets	Akanksha Chouhan et.al.	2504.01634	null
2025-03-31	Symmetry Enhanced Unconventional Spin Current Anisotropy in a Collinear Antiferromagnet	Pankhuri Gupta et.al.	2503.20545	null
2025-03-26	Intrinsic back-switching phenomenon in SOT-MRAM devices	Kuldeep Ray et.al.	2503.19840	null
2025-03-22	MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking	Haolin Qin et.al.	2503.17699	link
2025-04-07	Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID	Yu-Hsi Chen et.al.	2503.17237	link
2025-03-21	Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks	Haijin Zeng et.al.	2503.16930	null
2025-03-21	Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking	Meng Zhou et.al.	2503.16768	null
2025-03-17	UncTrack: Reliable Visual Object Tracking with Uncertainty-Aware Prototype Memory Network	Siyuan Yao et.al.	2503.12888	link
2025-03-16	Equivalent-Circuit Thermal Model for Batteries with One-Shot Parameter Identification	Myisha A. Chowdhury et.al.	2503.12616	null
2025-03-13	Target-aware Bidirectional Fusion Transformer for Aerial Object Tracking	Xinglong Sun et.al.	2503.09951	null
2025-03-09	Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking	Chaocan Xue et.al.	2503.06625	link
2025-03-09	Dynamic Updates for Language Adaptation in Visual-Language Tracking	Xiaohai Li et.al.	2503.06621	link
2025-03-06	High resolution spectra of the [6297-6303] and [6361-6367] Angstr{ö}m domains (including forbidden OI lines) of the Sun and brightest stars	Jean-Marie Malherbe et.al.	2503.05832	null
2025-03-07	Separating the bulk and interface contribution of spin-orbit torque in ferromagnet-Heavy metal bilayers tuned by variation of resistivity of heavy metal	Abu Bakkar Miah et.al.	2503.05341	null
2025-03-07	Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching	Simon A. Aytes et.al.	2503.05179	link
2025-03-02	Inefficiency of the orbit Hall effect on spin torque in transition metal/ferromagnet bilayers	Yizhuo Song et.al.	2503.00910	null
2025-02-27	MITracker: Multi-View Integration for Visual Object Tracking	Mengjie Xu et.al.	2502.20111	null
2025-03-08	Dynamic Degradation Decomposition Network for All-in-One Image Restoration	Huiqiang Wang et.al.	2502.19068	null
2025-02-25	UASTrack: A Unified Adaptive Selection Framework with Modality-Customization in Single Object Tracking	He Wang et.al.	2502.18220	null
2025-02-24	Symmetry-breaking effects on spin-orbit torque switching in ferromagnetic semiconductors with perpendicular magnetic anisotropy	Apu Kumar Jana et.al.	2502.16788	null
2025-02-17	Effects of antiferromagnetic coupling and pinning on domain wall dynamics in synthetic ferrimagnets	Sougata Mallick et.al.	2502.11621	null
2025-02-13	Modelling spin-orbitronics effects at interfaces and chiral molecules	Poonam Kumari et.al.	2502.09239	null
2025-02-12	Highly efficient field-free switching by orbital Hall torque in a MoS2-based device operating at room temperature	Antonio Bianco et.al.	2502.08483	null
2025-02-08	Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark	Shiao Wang et.al.	2502.05574	link
2025-02-06	Visualizing Field-free Deterministic Magnetic Switching of all-van der Waals Spin-Orbit Torque System Using Spin Ensembles in Hexagonal Boron Nitride	Xi Zhang et.al.	2502.04561	null
2025-01-27	Investigation of Sub-configurations Reveals Stable Spin-Orbit Torque Switching Polarity in Polycrystalline Mn3Sn	Boyu Zhao et.al.	2501.15815	null
2025-01-25	Thermal Stability and Depinning Currents of Domain Wall-Based Artificial Synapses	Guntas Kaur et.al.	2501.15102	null
2025-02-16	Enhancing Unconventional Spin-Orbit Torque Efficiency: Numerical Study on the Influence of Crystallographic Texture and Polycrystalline Effects on Low-Symmetry Materials	Yifei Yang et.al.	2501.14200	null
2025-01-22	Enhanced Field-Free Perpendicular Magnetization Switching via spin splitting torque in Altermagnetic RuO2-based Heterostructures	Badsha Sekh et.al.	2501.12593	null
2025-01-18	Multilayered MXenes for future two-dimensional nonvolatile magnetic memories	P. Kumar et.al.	2501.10678	null
2025-01-13	Robust Single Object Tracking in LiDAR Point Clouds under Adverse Weather Conditions	Xiantong Zhao et.al.	2501.07133	null
2025-01-11	ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation	Xuanle Zhao et.al.	2501.06598	link
2025-01-18	BTMTrack: Robust RGB-T Tracking via Dual-template Bridging and Temporal-Modal Candidate Elimination	Zhongxuan Zhang et.al.	2501.03616	null
2025-01-05	DeTrack: In-model Latent Denoising Learning for Visual Object Tracking	Xinyu Zhou et.al.	2501.02467	null
2024-12-31	Alternative harmonic detection approach for quantitative determination of spin and orbital torques	Y. Xu et.al.	2501.00403	null
2024-12-30	An Experimental Study of Passive UAV Tracking with Digital Arrays and Cellular Downlink Signals	Yifei Sun et.al.	2412.20788	null
2024-12-30	Spin-orbit torque in a three-fold-symmetric bilayer and its effect on magnetization dynamics	Wuzhang Fang et.al.	2412.20746	null
2024-12-28	Learning Adaptive and View-Invariant Vision Transformer with Multi-Teacher Knowledge Distillation for Real-Time UAV Tracking	You Wu et.al.	2412.20002	link
2024-12-27	Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues	X. Feng et.al.	2412.19648	link
2024-12-26	Semistrong edge colorings of planar graphs	Yuquan Lin et.al.	2412.19230	null
2024-12-26	SUTrack: Towards Simple and Unified Single Object Tracking	Xin Chen et.al.	2412.19138	link
2024-12-24	Linear Enhancement of Spin-Orbit Torques and Absence of Bulk Rashba-Type Spin Splitting in Perpendicularly Magnetized [Pt/Co/W]n Superlattices	Zhihao Yan et.al.	2412.18481	null
2024-12-24	Field-free current-induced magnetization switching of a room temperature van der Waals magnet for neuromorphic computing	Chenxi Zhou et.al.	2412.18429	null
2024-12-24	All-electric mimicking synaptic plasticity based on the noncollinear antiferromagnetic device	Cuimei Cao et.al.	2412.18418	null
2025-01-01	Unsupervised UAV 3D Trajectories Estimation with Sparse Point Clouds	Hanfang Liang et.al.	2412.12716	link
2024-12-15	Exploring Enhanced Contextual Information for Video-Level Object Tracking	Ben Kang et.al.	2412.11023	link
2024-12-13	Visual Object Tracking across Diverse Data Modalities: A Review	Mengmeng Wang et.al.	2412.09991	null
2024-12-09	Magnetic Switching in Monolayer 2D Diluted Magnetic Semiconductors via Spin-to- Spin Conversion	Siwei Chen et.al.	2412.06650	null
2024-12-09	Energy Efficient Stochastic Signal Manipulation in Superparamagnetic Tunnel Junctions via Voltage-Controlled Exchange Coupling	Qi Jia et.al.	2412.06256	null
2024-12-03	GSOT3D: Towards Generic 3D Single Object Tracking in the Wild	Yifan Jiao et.al.	2412.02129	link
2024-12-01	MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning	You Wu et.al.	2412.00626	link
2024-11-29	Current-driven motion of magnetic domain-wall skyrmions	Haoyang Nie et.al.	2411.19566	null
2024-11-28	Unveiling the anisotropy of linear and nonlinear charge-spin conversion in Weyl semimetal TaIrTe4	Tao Tang et.al.	2411.19062	null
2024-12-04	A Distractor-Aware Memory for Visual Object Tracking with SAM2	Jovana Videnovic et.al.	2411.17576	link
2024-11-24	MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking	Chunhui Zhang et.al.	2411.15761	link
2024-11-23	How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking	Xuchen Li et.al.	2411.15600	null
2024-11-23	MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking	Xinqi Liu et.al.	2411.15459	null
2024-11-24	ClickTrack: Towards Real-time Interactive Single Object Tracking	Kuiran Wang et.al.	2411.13183	null
2024-11-30	SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory	Cheng-Yen Yang et.al.	2411.11922	link
2024-11-14	Compression Method for Solar Polarization Spectra Collected from Hinode SOT/SP Observations	Jargalmaa Batmunkh et.al.	2411.09311	null
2024-11-10	Orthogonal Spin-Orbit Torque-Induced Deterministic Switching in NiO	Yixiao Qiao et.al.	2411.06379	null
2024-11-08	Giant spin Hall effect with multi-directional spin components in Ni4W	Yifei Yang et.al.	2411.05682	null
2024-11-04	Single-layer spin-orbit-torque magnetization switching due to spin Berry curvature generated by minute spontaneous atomic displacement in a Weyl oxide	Hiroto Horiuchi et.al.	2411.01806	null
2024-11-04	ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model	Yiming Sun et.al.	2411.01756	null
2024-11-03	Capping layer dependent anti-correlation between magnetic damping and spin-orbital to charge conversion	Antarjami Sahoo et.al.	2411.01662	null
2024-11-01	Spin orbit torque-driven motion of quasi-Bloch domain wall in perpendicularly magnetized W/CoFeB/MgO structures	Nobuyuki Umetsu et.al.	2411.00516	null
2024-10-31	Origin of line broadening in fading granule: influence of small-scale turbulence	Ryohtaroh T. Ishikawa et.al.	2410.23654	null
2024-10-27	NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking	Yu Liu et.al.	2410.20421	link
2024-10-25	Can Stories Help LLMs Reason? Curating Information Space Through Narrative	Vahid Sadiri Javadi et.al.	2410.19221	null
2024-10-19	The Solution for Single Object Tracking Task of Perception Test Challenge 2024	Zhiqiang Zhong et.al.	2410.16329	null
2024-10-14	A stronger form of Yamamoto’s theorem II – Spectral operators	Soumyashant Nayak et.al.	2410.16318	null
2024-10-03	Leveraging Event Streams with Deep Reinforcement Learning for End-to-End UAV Tracking	Ala Souissi et.al.	2410.14685	null
2024-10-16	DaDiff: Domain-aware Diffusion Model for Nighttime UAV Tracking	Haobo Zuo et.al.	2410.12270	link
2024-10-14	SMART-TRACK: A Novel Kalman Filter-Guided Sensor Fusion For Robust UAV Object Tracking in Dynamic Environments	Khaled Gabr et.al.	2410.10409	link
2024-10-09	DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM	Xuchen Li et.al.	2410.02492	null
2024-10-01	Energy-efficient picosecond spin-orbit torque magnetization switching in ferro- and ferrimagnetic films	Eva Díaz et.al.	2410.00474	null
2024-09-27	Improving Visual Object Tracking through Visual Prompting	Shih-Fang Chen et.al.	2409.18901	link
2024-09-27	Prompt-Driven Temporal Domain Adaptation for Nighttime UAV Tracking	Changhong Fu et.al.	2409.18533	link
2024-09-26	A 5T-2MTJ STT-assisted Spin Orbit Torque based Ternary Content Addressable Memory for Hardware Accelerators	Siri Narla et.al.	2409.17863	null
2024-09-26	General Compression Framework for Efficient Transformer Object Tracking	Lingyi Hong et.al.	2409.17564	null
2024-09-26	Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking	Pengcheng Shao et.al.	2409.17560	null
2024-09-25	Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2	Chunhui Zhang et.al.	2409.16902	link
2024-09-25	Conditional Generative Denoiser for Nighttime UAV Tracking	Yucheng Wang et.al.	2409.16834	link
2024-09-25	Progressive Representation Learning for Real-Time UAV Tracking	Changhong Fu et.al.	2409.16652	link
2024-09-25	Enhancing Nighttime UAV Tracking with Light Distribution Suppression	Liangliang Yao et.al.	2409.16631	link
2024-09-24	Pulse Shaping Strategies for Efficient Switching of Magnetic Tunnel Junctions by Spin-Orbit Torque	Marco Hoffmann et.al.	2409.16454	null
2024-09-24	CloudTrack: Scalable UAV Tracking with Cloud Semantics	Yannik Blei et.al.	2409.16111	link
2024-09-20	A survey of sulfur-bearing molecular lines toward the dense cores in eleven massive protoclusters	Mengyao Tang et.al.	2409.13231	null
2024-09-19	Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC	Jiawen Kang et.al.	2409.12388	link
2024-09-11	Topological Spin-Orbit Torque in Ferrimagnetic Weyl Semimetal	Tomonari Meguro et.al.	2409.07106	null
2024-09-09	Effects of Interfacial Oxygen Diffusion on the Magnetic Properties and Thermal Stability of Pd/CoFeB/Pd/Ta Heterostructure	Saravanan Lakshmanan et.al.	2409.05783	null
2024-09-11	Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition	Hao Shi et.al.	2409.00815	null
2024-08-30	Advancing Multi-talker ASR Performance with Large Language Models	Mohan Shi et.al.	2408.17431	null
2024-08-30	Cross Fusion RGB-T Tracking with Bi-directional Adapter	Zhirong Zeng et.al.	2408.16979	null
2024-08-23	Energy-efficient field-free unconventional spin-orbit torque magnetization switching dynamics in van der Waals heterostructures	Lalit Pandey et.al.	2408.13095	null
2024-08-21	Low-Light Object Tracking: A Benchmark	Pengzhi Zhong et.al.	2408.11463	link
2024-08-20	MambaEVT: Event Stream based Visual Object Tracking using State Space Model	Xiao Wang et.al.	2408.10487	link
2024-08-19	Reconfigurable Spin Logics and High-density Multistate Memory in a Single Spin-orbit Torque Device	Raghvendra Posti et.al.	2408.09866	null
2024-08-16	Initialization-Free Multistate Memristor: Synergy of Spin-Orbit Torque and Magnetic Fields	Raghvendra Posti et.al.	2408.08641	null
2024-08-15	MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking	Simiao Lai et.al.	2408.07889	null
2024-08-12	Latent Disentanglement for Low Light Image Enhancement	Zhihao Zheng et.al.	2408.06245	null
2024-08-11	Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends	Jeffry Victor et.al.	2408.05857	null
2024-08-05	VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking	Yuxuan Lu et.al.	2408.02263	null
2024-08-04	3D Single-object Tracking in Point Clouds with High Temporal Variation	Qiao Wu et.al.	2408.02049	null
2024-07-30	Strained topological insulator spin-orbit torque random access memory (STI-SOTRAM) bit cell for energy-efficient Processing in Memory	Md Golam Morshed et.al.	2407.20925	null
2024-07-19	HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation	Zezeng Li et.al.	2407.14419	null
2024-07-17	Strawberry detection and counting based on YOLOv7 pruning and information based tracking algorithm	Shiyu Liu et.al.	2407.12614	null
2024-07-15	Effective Motion Modeling for UAV-platform Multiple Object Tracking with Re-Margin Loss	Mufeng Yao et.al.	2407.10485	link
2024-07-16	Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking	Lorenzo Vaquero et.al.	2407.10151	link
2024-07-12	DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects	Peng Wang et.al.	2407.09051	null
2024-07-11	Manipulating a Tetris-Inspired 3D Video Representation	Mihir Godbole et.al.	2407.08885	null
2024-07-11	Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets	Linh Van Ma et.al.	2407.08872	link
2024-07-11	CommRad: Context-Aware Sensing-Driven Millimeter-Wave Networks	Ish Kumar Jain et.al.	2407.08817	null
2024-07-10	Deep Learning-Based Robust Multi-Object Tracking via Fusion of mmWave Radar and Camera Sensors	Lei Cheng et.al.	2407.08049	null
2024-07-10	*Large spin-orbit torque in a-plane $α$-Fe${2}$O${3}$ /Pt bilayers*	Igor Lyalin et.al.	2407.07731	null
2024-07-10	Spin Splitting in Altermagnetic RuO $_2$ Enables Field-free Spin-Orbit Torque Switching via Dominant Out-of-Plane Spin Polarization	Zhuoyi Li et.al.	2407.07447	null
2024-07-09	Unconventional Spin-Orbit Torques from Sputtered MoTe2 Films	Shuchen Li et.al.	2407.06487	null
2024-07-07	Addressing single object tracking in satellite imagery through prompt-engineered solutions	Athena Psalta et.al.	2407.05518	null
2024-07-07	Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking	You Wu et.al.	2407.05383	null
2024-07-09	P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds	Jiahao Nie et.al.	2407.05238	link
2024-07-05	Median Mishaps between Chirality and Spin-Orbit Torques via Asymmetric Hysteresis	Minhwan Kim et.al.	2407.04624	null
2024-07-04	Serialized Output Training by Learned Dominance	Ying Shi et.al.	2407.03966	null
2024-07-04	TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers	Fatemeh Nourilenjan Nokabadi et.al.	2407.03946	link
2024-07-04	Out-of-Plane Polarization from Spin Reflection Induces Field-Free Spin-Orbit Torque Switching in Structures with Canted NiO Interfacial Moments	Zhe Zhang et.al.	2407.03676	null

HDR

Publish Date	Title	Authors	PDF	Code
2025-06-19	Seven-Probe Fiber Detector for Time-Resolved Source Tracking in HDR-Brachytherapy: Experimental Evaluation	Mathieu Gonod et.al.	2506.16124	null
2025-06-14	Fine-Grained HDR Image Quality Assessment From Noticeably Distorted to Very High Fidelity	Mohsen Jenadeleh et.al.	2506.12505	null
2025-06-13	Automated Treatment Planning for Interstitial HDR Brachytherapy for Locally Advanced Cervical Cancer using Deep Reinforcement Learning	Mohammadamin Moradi et.al.	2506.11957	null
2025-06-11	Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy	Tonghe Wang et.al.	2506.09805	null
2025-06-11	TRAPs, Generalisations of MZVs, Locality and Resurgence for Quantum Field Theories	Pierre J. Clavier et.al.	2506.09493	null
2025-06-04	Photoreal Scene Reconstruction from an Egocentric Device	Zhaoyang Lv et.al.	2506.04444	link
2025-06-04	GRAVITY+ adaptive optics (GPAO) tests in Europe	Florentin Millour et.al.	2506.03721	null
2025-06-03	IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation	Yuanze Lin et.al.	2506.03150	null
2025-05-29	LCB-CV-UNet: Enhanced Detector for High Dynamic Range Radar Signals	Yanbin Wang et.al.	2505.23454	null
2025-05-29	iHDR: Iterative HDR Imaging with Arbitrary Number of Exposures	Yu Yuan et.al.	2505.22971	null
2025-05-27	HDRSDR-VQA: A Subjective Video Quality Dataset for HDR and SDR Comparative Evaluation	Bowen Chen et.al.	2505.21831	null
2025-05-26	Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting	Yizhou Zhao et.al.	2505.20582	null
2025-05-28	EventEgoHands: Event-based Egocentric 3D Hand Mesh Reconstruction	Ryosei Hara et.al.	2505.19169	null
2025-05-23	Distance Estimation in Outdoor Driving Environments Using Phase-only Correlation Method with Event Cameras	Masataka Kobayashi et.al.	2505.17582	null
2025-05-22	V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation	Hanyue Lou et.al.	2505.16797	link
2025-05-21	Evaluation of Mobile Environment for Vehicular Visible Light Communication Using Multiple LEDs and Event Cameras	Ryota Soga et.al.	2505.15412	null
2025-05-17	NTIRE 2025 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results	Sangmin Lee et.al.	2505.12089	null
2025-05-22	Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition	Runduo Han et.al.	2505.12007	link
2025-05-16	Towards Navigation-Grade and Deployable Optomechanical Accelerometery	Chang Ge et.al.	2505.11751	null
2025-05-16	Planar Velocity Estimation for Fast-Moving Mobile Robots Using Event-Based Optical Flow	Liam Boyle et.al.	2505.11116	null
2025-05-14	Efficient Modelling of Lyman-α opacity fluctuations during late EoR	Barun Maity et.al.	2505.09369	null
2025-05-13	A Survey of 3D Reconstruction with Event Cameras: From Event-based Geometry to Neural 3D Rendering	Chuanzhi Xu et.al.	2505.08438	null
2025-05-12	Asynchronous Multi-Object Tracking with an Event Camera	Angus Apps et.al.	2505.08126	link
2025-05-12	Towards a physically realistic computationally efficient DVS pixel model	Rui Graca et.al.	2505.07386	null
2025-05-12	RealRep: Generalized SDR-to-HDR Conversion with Style Disentangled Representation Learning	Gang He et.al.	2505.07322	null
2025-04-30	From Events to Enhancement: A Survey on Event-Based Imaging Technologies	Yunfan Lu et.al.	2505.05488	null
2025-05-08	EDmamba: A Simple yet Effective Event Denoising Method with State Space Model	Ciyu Ruan et.al.	2505.05391	null
2025-05-07	EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events	Shuoyan Wei et.al.	2505.04657	link
2025-05-06	Benchmark-based Study of CPU/GPU Power-Related Features through JAX and TensorFlow	Roblex Nana Tchakoute et.al.	2505.03398	null
2025-05-02	High Dynamic Range Novel View Synthesis with Single Exposure	Kaixuan Zhang et.al.	2505.01212	link
2025-04-29	A Survey on Event-based Optical Marker Systems	Nafiseh Jabbari Tofighi et.al.	2504.20736	null
2025-05-12	Spike Imaging Velocimetry: Dense Motion Estimation of Fluids Using Spike Cameras	Yunzhong Zhang et.al.	2504.18864	null
2025-04-25	Boxi: Design Decisions in the Context of Algorithmic Performance for Robotics	Jonas Frey et.al.	2504.18500	null
2025-04-25	BiasBench: A reproducible benchmark for tuning the biases of event cameras	Andreas Ziegler et.al.	2504.18235	null
2025-04-25	Post-Transfer Learning Statistical Inference in High-Dimensional Regression	Nguyen Vu Khai Tam et.al.	2504.18212	null
2025-04-24	CasualHDRSplat: Robust High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos	Shucheng Gong et.al.	2504.17728	link
2025-04-27	EHGCN: Hierarchical Euclidean-Hyperbolic Fusion via Motion-Aware GCN for Hybrid Event Stream Perception	Haosheng Chen et.al.	2504.16616	null
2025-04-23	SaENeRF: Suppressing Artifacts in Event-based Neural Radiance Fields	Yuanjian Wang et.al.	2504.16389	link
2025-04-20	Approaches to High Dynamic Range Imaging - Application to the ngVLA	T. K. Sridharan et.al.	2504.14449	null
2025-04-17	CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework	Wentao Wu et.al.	2504.12576	link
2025-04-21	Event Quality Score (EQS): Assessing the Realism of Simulated Event Camera Streams via Distances in Latent Space	Kaustav Chanda et.al.	2504.12515	null
2025-04-16	Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging	Tristan S. W. Stevens et.al.	2504.12154	null
2025-04-11	High Dynamic Range Modulo Imaging for Robust Object Detection in Autonomous Driving	Kebin Contreras et.al.	2504.11472	null
2025-04-17	GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR	Christophe Bolduc et.al.	2504.10809	null
2025-04-14	Minimal Sensing for Orienting a Solar Panel	Jeremy Klotz et.al.	2504.10765	null
2025-04-13	Low-Light Image Enhancement using Event-Based Illumination Estimation	Lei Sun et.al.	2504.09379	null
2025-04-10	S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion	Yujin Wang et.al.	2504.07667	null
2025-04-08	Orthogonal Matching Pursuit based Reconstruction for Modulo Hysteresis Operators	Matthias Beckmann et.al.	2504.05895	null
2025-04-08	Inter-event Interval Microscopy for Event Cameras	Changqing Su et.al.	2504.04924	null
2025-04-06	eKalibr-Stereo: Continuous-Time Spatiotemporal Calibration for Event-Based Stereo Visual Systems	Shuolong Chen et.al.	2504.04451	link
2025-04-05	Autoregressive High-Order Finite Difference Modulo Imaging: High-Dynamic Range for Computer Vision Applications	Brayan Monroy et.al.	2504.04228	null
2025-04-03	Brightness Perceiving for Recursive Low-Light Image Enhancement	Haodian Wang et.al.	2504.02362	link
2025-04-02	Anomaly Detection for Hybrid Butterfly Subspecies via Probability Filtering	Bo-Kai Ruan et.al.	2504.01671	link
2025-03-31	DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting	Seungjun Lee et.al.	2503.24210	null
2025-03-29	SuperEIO: Self-Supervised Event Feature Learning for Event Inertial Odometry	Peiyu Chen et.al.	2503.22963	link
2025-03-28	Enhancing Celestial Imaging: High Dynamic Range with Neuromorphic Cameras	Satyapreet Singh Yadav et.al.	2503.22814	null
2025-03-26	SpikeDerain: Unveiling Clear Videos from Rainy Sequences Using Color Spike Streams	Hanwen Liang et.al.	2503.20315	null
2025-03-26	A Survey on Event-driven 3D Reconstruction: Development under Different Categories	Chuanzhi Xu et.al.	2503.19753	null
2025-03-25	Maximum Likelihood Estimation Based Complex-Valued Robust Chinese Remainder Theorem and Its Fast Algorithm	Xiaoping Li et.al.	2503.18625	null
2025-03-21	Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras	Shuang Guo et.al.	2503.17262	link
2025-03-20	Neuromorphic Cameras in Astronomy: Unveiling the Future of Celestial Imaging Beyond Conventional Limits	Satyapreet Singh Yadav et.al.	2503.15883	null
2025-03-19	Boosting HDR Image Reconstruction via Semantic Knowledge Transfer	Qingsen Yan et.al.	2503.15361	null
2025-03-20	VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention	Mingzhe Zheng et.al.	2503.15138	null
2025-03-18	Weakly Supervised Spatial Implicit Neural Representation Learning for 3D MRI-Ultrasound Deformable Image Registration in HDR Prostate Brachytherapy	Jing Wang et.al.	2503.14395	null
2025-03-17	UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks	Yuanbin Qian et.al.	2503.12905	link
2025-03-17	Stereo Event-based, 6-DOF Pose Tracking for Uncooperative Spacecraft	Zibin Liu et.al.	2503.12732	link
2025-03-16	EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera	Luming Wang et.al.	2503.12419	link
2025-03-14	Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP	Trevor D. Canham et.al.	2503.11883	null
2025-03-13	GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping	Jinfeng Liu et.al.	2503.10143	null
2025-03-10	Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion	Haowen Bai et.al.	2503.07235	null
2025-03-08	Optimization models for needle placement in 3D-printed masks for high dose rate brachytherapy	Nasim Mirzavand Boroujeni et.al.	2503.06000	null
2025-03-16	DeepGrav: Anomalous Gravitational-Wave Detection Through Deep Latent Features	Jianqi Yan et.al.	2503.03799	link
2025-03-05	BAT: Learning Event-based Optical Flow with Bidirectional Adaptive Temporal Correlation	Gangwei Xu et.al.	2503.03256	null
2025-03-04	ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement	Xuejian Guo et.al.	2503.02484	link
2025-03-03	S-R2D2: a spherical extension of the R2D2 deep neural network series paradigm for wide-field radio-interferometric imaging	A. Tajja et.al.	2503.01462	null
2025-03-03	Adaptive cold-atom magnetometry mitigating the trade-off between sensitivity and dynamic range	Zhu Ma et.al.	2503.01211	null
2025-03-01	High Dynamic Range Video Compression: A Large-Scale Benchmark Dataset and A Learned Bit-depth Scalable Compression Algorithm	Zhaoyi Tian et.al.	2503.00410	link
2025-03-01	Adversarial Attacks on Event-Based Pedestrian Detectors: A Physical Approach	Guixu Lin et.al.	2503.00377	null
2025-02-28	EVLoc: Event-based Visual Localization in LiDAR Maps via Event-Depth Registration	Kuangyi Chen et.al.	2503.00167	link
2025-02-28	SEE: See Everything Every Time – Adaptive Brightness Adjustment for Broad Light Range Images via Events	Yunfan Lu et.al.	2502.21120	null
2025-02-18	Fast Antibiotic resistance-Based gene editing of mammalian cells with CRISPR-Cas9 (FAB-CRISPR)	Petia Adarska et.al.	2502.12675	null
2025-02-14	Quantifying Phase Magnitudes of Open-Source Focused-Probe 4D-STEM Ptychography Reconstructions	Toma Susi et.al.	2502.09938	link
2025-02-10	Indoor Light and Heat Estimation from a Single Panorama	Guanzhou Ji et.al.	2502.06973	null
2025-02-09	Compressed sensing enabled high-bandwidth and large dynamic range magnetic sensing	Galya Haim et.al.	2502.06070	null
2025-02-09	Energy-Efficient Autonomous Aerial Navigation with Dynamic Vision Sensors: A Physics-Guided Neuromorphic Approach	Sourav Sanyal et.al.	2502.05938	null
2025-02-07	Differentiable Mobile Display Photometric Stereo	Gawoon Ban et.al.	2502.05055	null
2025-02-05	Deep Learning-based Event Data Coding: A Joint Spatiotemporal and Polarity Solution	Abdelrahman Seleem et.al.	2502.03285	null
2025-02-04	Event-aided Semantic Scene Completion	Shangwei Guo et.al.	2502.02334	link
2025-01-23	HP2 Survey V. Ophiuchus: Filament formation in a dispersing cloud complex	João Alves et.al.	2501.13931	null
2025-01-22	DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning	Wenhao Gu et.al.	2501.12898	null
2025-01-20	UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion	Zixuan Chen et.al.	2501.11515	null
2025-01-10	eKalibr: Dynamic Intrinsic Calibration for Event Cameras From First Principles of Events	Shuolong Chen et.al.	2501.05688	link
2025-01-07	AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene	Chaoran Feng et.al.	2501.02807	null
2024-12-26	Learning Monocular Depth from Events via Egomotion Compensation	Haitao Meng et.al.	2412.19067	null
2024-12-25	HAND: Hierarchical Attention Network for Multi-Scale Handwritten Document Recognition and Layout Analysis	Mohammed Hamdan et.al.	2412.18981	null
2024-12-20	High-Dynamic Range Broadband Terahertz Time-Domain Spectrometer Based on Organic Crystal MNA	Samira Mansourzadeh et.al.	2412.15718	null
2024-12-19	Event-assisted 12-stop HDR Imaging of Dynamic Scene	Shi Guo et.al.	2412.14705	null
2025-01-06	LEDiff: Latent Exposure Diffusion for HDR Generation	Chao Wang et.al.	2412.14456	null
2024-12-18	Development of a High-Resolution, High-Dynamic-Range Charge Detector for Ion Beam Monitoring	O. Adriani et.al.	2412.13934	null
2024-12-18	Multi-Exposure Image Fusion via Distilled 3D LUT Grid with Editable Mode	Xin Su et.al.	2412.13749	link
2024-12-17	Transforming Single Photon Camera Images to Color High Dynamic Range Images	Sumit Sharma et.al.	2412.12942	null
2024-12-17	Efficient Event-based Semantic Segmentation with Spike-driven Lightweight Transformer-based Networks	Xiaxin Zhu et.al.	2412.12843	null
2024-12-17	Compressed Sensing Based Residual Recovery Algorithms and Hardware for Modulo Sampling	Shaik Basheeruddin Shah et.al.	2412.12724	null
2024-12-16	Towards Physically-Based Sky-Modeling	Ian J. Maquignaz et.al.	2412.11883	null
2024-12-16	High dynamic-range quantum sensing of magnons and their dynamics using a superconducting qubit	Sonia Rani et.al.	2412.11859	null
2024-12-16	Predicting the Original Appearance of Damaged Historical Documents	Zhenhua Yang et.al.	2412.11634	link
2024-12-16	Event-based Detectors for Laser Guide Star Tip-Tilt Sensing	Monique Cockram et.al.	2412.11436	null
2024-12-12	Continuous Gaussian Process Pre-Optimization for Asynchronous Event-Inertial Odometry	Zhixiang Wang et.al.	2412.08909	null
2024-12-10	EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering	Toshiya Yura et.al.	2412.07293	null
2024-12-09	Fitting Spherical Gaussians to Dynamic HDRI Sequences	Pascal Clausen et.al.	2412.06511	null
2024-12-09	Event fields: Capturing light fields at high speed, resolution, and dynamic range	Ziyuan Qu et.al.	2412.06191	null
2024-12-07	On an Analytical Inversion Formula for the Modulo Radon Transform	Matthias Beckmann et.al.	2412.05711	null
2024-12-05	DHOST theories as disformal gravity: From black holes to radiative spacetimes	Jibril Ben Achour et.al.	2412.04135	null
2024-12-05	High-power single-cycle THz emission from large-area photoconductive emitters at 400 kHz	Mohsen Khalili et.al.	2412.04004	null
2024-12-05	Enhancing and Accelerating Diffusion-Based Inverse Problem Solving through Measurements Optimization	Tianyu Chen et.al.	2412.03941	null
2024-12-04	Accelerating HI density predictions during the Epoch of Reionization using a GPR-based emulator on N-body simulations	Gaurav Pundir et.al.	2412.03485	null
2024-12-03	EvRT-DETR: The Surprising Effectiveness of DETR-based Detection for Event Cameras	Dmitrii Torbunov et.al.	2412.02890	link
2024-12-02	Learning Differential Pyramid Representation for Tone Mapping	Qirui Yang et.al.	2412.01463	null
2024-11-28	Event-based Tracking of Any Point with Motion-Robust Correlation Features	Friedhelm Hamann et.al.	2412.00133	link
2024-11-25	CapHDR2IR: Caption-Driven Transfer from Visible Light to Infrared Domain	Jingchao Peng et.al.	2411.16327	null
2024-11-22	High-dynamic-range atomic clocks with dual Heisenberg-limited precision scaling	Jungeng Zhou et.al.	2411.14944	null
2024-11-20	Demonstrating the Suitability of Neuromorphic, Event-Based, Dynamic Vision Sensors for In Process Monitoring of Metallic Additive Manufacturing and Welding	David Mascareñas et.al.	2411.13108	null
2024-11-18	Noise Filtering Benchmark for Neuromorphic Satellites Observations	Sami Arja et.al.	2411.11233	link
2024-11-16	Beyond Feature Mapping GAP: Integrating Real HDRTV Priors for Superior SDRTV-to-HDRTV Conversion	Kepeng Xu et.al.	2411.10775	null
2024-11-15	CaLES: A GPU-accelerated solver for large-eddy simulation of wall-bounded flows	Maochao Xiao et.al.	2411.09364	link
2024-11-11	Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models	NVIDIA et.al.	2411.07126	null
2024-11-25	Increasing the scalability of graph convolution for FPGA-implemented event-based vision	Piotr Wzorek et.al.	2411.04269	null
2024-11-13	DEIO: Deep Event Inertial Odometry	Weipeng Guan et.al.	2411.03928	link
2024-11-05	Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor	Anish Bhattacharya et.al.	2411.03303	null
2024-11-05	Learning-based Lossless Event Data Compression	Ahmadreza Sezavar et.al.	2411.03010	null
2024-10-30	Automatic programming via large language models with population self-evolution for dynamic job shop scheduling problem	Jin Huang et.al.	2410.22657	null
2024-10-29	EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data	Zhonghua Yi et.al.	2410.21743	link
2024-10-28	NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments	Taiyi Pan et.al.	2410.21615	link
2024-10-27	BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events	Yijin Li et.al.	2410.20451	null
2024-10-26	Unleashing Dynamic Range and Resolution in Unlimited Sensing Framework via Novel Hardware	Yuliang Zhu et.al.	2410.20193	null
2024-10-21	Scene-Segmentation-Based Exposure Compensation for Tone Mapping of High Dynamic Range Scenes	Yuma Kinoshita et.al.	2410.19839	null
2024-10-24	Environment Maps Editing using Inverse Rendering and Adversarial Implicit Functions	Antonio D’Orazio et.al.	2410.18622	null
2024-10-23	Frequency-dependent amplitude correction to free-precession scalar magnetometers	M. E. Limes et.al.	2410.18224	null
2024-10-22	SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition	Jiaqi Chen et.al.	2410.16746	link
2024-10-19	A Cycle Ride to HDR: Semantics Aware Self-Supervised Framework for Unpaired LDR-to-HDR Image Translation	Hrishav Bakul Barua et.al.	2410.15068	link
2024-10-17	360U-Former: HDR Illumination Estimation with Panoramic Adapted Vision Transformers	Jack Hilliard et.al.	2410.13566	null
2024-10-17	On Quantum Programming Languages	Benoît Valiron et.al.	2410.13337	null
2024-10-16	An O(m+n)-Space Spatiotemporal Denoising Filter with Cache-Like Memories for Dynamic Vision Sensors	Qinghang Zhao et.al.	2410.12423	null
2024-10-10	DifFRelight: Diffusion-Based Facial Performance Relighting	Mingming He et.al.	2410.08188	null
2024-10-18	IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera	Jian Huang et.al.	2410.08107	link
2024-10-09	Fourier-based Action Recognition for Wildlife Behavior Quantification with Event Cameras	Friedhelm Hamann et.al.	2410.06698	null
2024-10-03	Spiking Neural Network as Adaptive Event Stream Slicer	Jiahang Cao et.al.	2410.02249	link
2024-10-03	Capturing complex hand movements and object interactions using machine learning-powered stretchable smart textile gloves	Arvin Tashakori et.al.	2410.02221	link
2024-10-01	Signatures of Black Hole Spin and Plasma Acceleration in Jet Polarimetry	Zachary Gelles et.al.	2410.00954	null
2024-10-04	VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models	Jiapeng Wang et.al.	2410.00741	null
2024-09-26	Photon Inhibition for Energy-Efficient Single-Photon Imaging	Lucas J. Koerner et.al.	2409.18337	null
2024-09-26	Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions	Weng Fei Low et.al.	2409.17988	null
2024-09-26	Unsupervised Learning Based Multi-Scale Exposure Fusion	Chaobing Zheng et.al.	2409.17830	null
2024-09-26	Event-based Stereo Depth Estimation: A Survey	Suman Ghosh et.al.	2409.17680	null
2024-09-26	Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking	Pengcheng Shao et.al.	2409.17560	null
2024-09-25	EventHDR: from Event to High-Speed HDR Videos and Beyond	Yunhao Zou et.al.	2409.17029	null
2024-09-25	Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training	Kun Song et.al.	2409.16767	null
2024-09-24	Sub-Nyquist USF Spectral Estimation: $K$ Frequencies with $6K + 4$ Modulo Samples	Ruiming Guo et.al.	2409.16472	null
2024-09-24	Neuromorphic Drone Detection: an Event-RGB Multimodal Approach	Gabriele Magrini et.al.	2409.16099	link
2024-09-24	Deep chroma compression of tone-mapped images	Xenios Milidonis et.al.	2409.16032	link
2024-09-23	Mixing Data-driven and Geometric Models for Satellite Docking Port State Estimation using an RGB or Event Camera	Cedric Le Gentil et.al.	2409.15581	null
2024-09-23	SpikeGS: Learning 3D Gaussian Fields from Continuous Spike Stream	Jinze Yu et.al.	2409.15176	link
2024-09-21	Monocular Event-Inertial Odometry with Adaptive decay-based Time Surface and Polarity-aware Tracking	Kai Tang et.al.	2409.13971	null
2024-09-20	Intrinsic Single-Image HDR Reconstruction	Sebastian Dille et.al.	2409.13803	link
2024-09-20	Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors	Zixin Zhang et.al.	2409.13392	null
2024-09-18	EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning	Yukun Tian et.al.	2409.11813	null
2024-09-18	Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network	Jiale Wang et.al.	2409.11677	null
2024-09-16	Programmable multifunctional integrated microwave photonic circuit on thin-film lithium niobate	Chuangchuang Wei et.al.	2409.10227	null
2024-09-15	SciDVS: A Scientific Event Camera with 1.7% Temporal Contrast Sensitivity at 0.7 lux	Rui Graca et.al.	2409.09648	null
2024-09-13	Integration of high-performance compact interferometric sensors in a suspended interferometer	Alexandra Mitchell et.al.	2409.08843	null
2024-09-13	Adaptive Robust High-Precision Atomic Gravimetry	Jinye Wei et.al.	2409.08550	null
2024-09-07	Neural Augmentation Based Panoramic High Dynamic Range Stitching	Chaobing Zheng et.al.	2409.04679	null
2024-09-05	MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of Mice	Friedhelm Hamann et.al.	2409.03358	link
2024-09-03	Gradient events: improved acquisition of visual information in event cameras	Eero Lehtonen et.al.	2409.01764	null
2024-09-02	SoK: Security of the Image Processing Pipeline in Autonomous Vehicles	Michael Kühr et.al.	2409.01234	link
2024-08-30	Synthetic Lunar Terrain: A Multimodal Open Dataset for Training and Evaluating Neuromorphic Vision Algorithms	Marcus Märtens et.al.	2408.16971	null
2024-08-29	EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More	Kanghao Chen et.al.	2408.16254	null
2024-08-28	ES-PTAM: Event-based Stereo Parallel Tracking and Mapping	Suman Ghosh et.al.	2408.15605	link
2024-08-27	Towards Real-world Event-guided Low-light Video Enhancement and Deblurring	Taewoo Kim et.al.	2408.14916	link
2024-08-27	Recent Event Camera Innovations: A Survey	Bharatesh Chakravarthi et.al.	2408.13627	link
2024-08-24	Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation	Yuxuan Zhou et.al.	2408.13586	link
2024-08-22	ISETHDR: A Physics-based Synthetic Radiance Dataset for High Dynamic Range Driving Scenes	Zhenyi Liu et.al.	2408.12048	link
2024-08-20	Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm	Xiao Wang et.al.	2408.10488	link
2024-08-20	MambaEVT: Event Stream based Visual Object Tracking using State Space Model	Xiao Wang et.al.	2408.10487	link
2024-08-19	Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms	Xiao Wang et.al.	2408.09764	link
2024-08-19	Phase-Separated Charge Order and Twinning Across Length Scales in CsV $_3$Sb$_5$	Jayden Plumb et.al.	2408.08842	null
2024-08-16	CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving	Shihan Peng et.al.	2408.08500	null
2024-08-13	MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation	JunYong Choi et.al.	2408.06707	null
2024-08-13	HDRGS: High Dynamic Range Gaussian Splatting	Jiahao Wu et.al.	2408.06543	link
2024-08-12	Rethinking Video with a Universal Event-Based Representation	Andrew Freeman et.al.	2408.06248	null
2024-08-10	EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency	Junjie Jiang et.al.	2408.05452	null
2024-08-06	Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera	Zibin Liu et.al.	2408.03225	link
2024-07-31	Exploiting Change Blindness for Video Coding: Perspectives from a Less Promising User Study	Mitra Amiri et.al.	2408.00052	null
2024-07-23	HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images	Shreyas Singh et.al.	2407.16503	link
2024-07-23	SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging	Lingtong Kong et.al.	2407.16308	link
2024-07-24	SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams	Liangyan Jiang et.al.	2407.15708	link
2024-08-04	Exposure Completing for Temporally Consistent Neural High Dynamic Range Video Rendering	Jiahao Cui et.al.	2407.13309	link
2024-07-18	Learned HDR Image Compression for Perceptually Optimal Storage and Display	Peibei Cao et.al.	2407.13179	null
2024-07-17	Nonlinear tomographic reconstruction via nonsmooth optimization	Vasileios Charisopoulos et.al.	2407.12984	null
2024-07-16	VideoClusterNet: Self-Supervised and Adaptive Clustering For Videos	Devesh Walawalkar et.al.	2407.12214	null
2024-07-16	I $^2$ -SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM	Gwangtak Bae et.al.	2407.11347	null
2024-07-15	Temporal Event Stereo via Joint Learning with Stereoscopic Flow	Hoonhee Cho et.al.	2407.10831	link
2024-07-15	Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation	Yuhwan Jeong et.al.	2407.10703	link
2024-07-15	Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction	Lin Zhu et.al.	2407.10636	null
2024-07-18	Efficient hybrid technique for generating sub-grid haloes in reionization simulations	Ankur Barsode et.al.	2407.10585	null
2024-07-12	Radiance Fields from Photons	Sacha Jungerman et.al.	2407.09386	null
2024-07-11	Event-based vision on FPGAs – a survey	Tomasz Kryjak et.al.	2407.08356	null
2024-07-12	Dynamic phase transition into a mixed-CDW state in 1 $T$-TaS$_2$ via a thermal quench	A. de la Torre et.al.	2407.07953	null
2024-07-08	PanDORA: Casual HDR Radiance Acquisition for Indoor Scenes	Mohammad Reza Karimi Dastjerdi et.al.	2407.06150	null
2024-07-08	Neuromorphic Imaging with Super-Resolution	Pei Zhang et.al.	2407.05764	null

Low-Level

Publish Date	Title	Authors	PDF	Code
2025-06-26	Wild refitting for black box prediction	Martin J. Wainwright et.al.	2506.21460	null
2025-06-26	Learning to See in the Extremely Dark	Hai Jiang et.al.	2506.21132	null
2025-06-25	On the Burstiness of Faces in Set	Jiong Wang et.al.	2506.20312	null
2025-06-25	TDiR: Transformer based Diffusion for Image Restoration Tasks	Abbas Anwar et.al.	2506.20302	null
2025-06-24	A Comparative Study of NAFNet Baselines for Image Restoration	Vladislav Esaulov et.al.	2506.19845	null
2025-06-24	NAADA: A Noise-Aware Attention Denoising Autoencoder for Dental Panoramic Radiographs	Khuram Naveed et.al.	2506.19387	null
2025-06-24	jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval	Michael Günther et.al.	2506.18902	null
2025-06-23	Enhancing Image Restoration Transformer via Adaptive Translation Equivariance	JiaKui Hu et.al.	2506.18520	null
2025-06-23	BSMamba: Brightness and Semantic Modeling for Long-Range Interaction in Low-Light Image Enhancement	Tongshun Zhang et.al.	2506.18346	null
2025-06-23	A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement	Muhammad Azeem Aslam et.al.	2506.18323	null
2025-06-23	Attention-Based Ensemble Learning for Crop Classification Using Landsat 8-9 Fusion	Zeeshan Ramzan et.al.	2506.18321	null
2025-06-26	Referring Expression Instance Retrieval and A Strong End-to-End Baseline	Xiangzhao Hao et.al.	2506.18246	null
2025-06-22	CmFNet: Cross-modal Fusion Network for Weakly-supervised Segmentation of Medical Images	Dongdong Meng et.al.	2506.18042	null
2025-06-20	Reversing Flow for Image Restoration	Haina Qin et.al.	2506.16961	null
2025-06-20	Visual-Instructed Degradation Diffusion for All-in-One Image Restoration	Wenyang Luo et.al.	2506.16960	link
2025-06-20	Temperature calibration of surface emissivities with an improved thermal image enhancement network	Ning Chu et.al.	2506.16803	null
2025-06-23	RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought	Junbo Qiao et.al.	2506.16796	null
2025-06-20	TextBraTS: Text-Guided Volumetric Brain Tumor Segmentation with Innovative Dataset Development and Fusion Module Exploration	Xiaoyu Shi et.al.	2506.16784	null
2025-06-20	Infrared and Visible Image Fusion Based on Implicit Neural Representations	Shuchen Sun et.al.	2506.16773	null
2025-06-20	Class Agnostic Instance-level Descriptor for Visual Instance Search	Qi-Ying Sun et.al.	2506.16745	null
2025-06-20	TeSG: Textual Semantic Guidance for Infrared and Visible Image Fusion	Mingrui Zhu et.al.	2506.16730	null
2025-06-19	MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval	Chao He et.al.	2506.16353	null
2025-06-19	Fine-grained Image Retrieval via Dual-Vision Adaptation	Xin Jiang et.al.	2506.16273	null
2025-06-18	DM-FNet: Unified multimodal medical image fusion via diffusion process-trained encoder-decoder	Dan He et.al.	2506.15218	link
2025-06-18	ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections	Ziling Huang et.al.	2506.15180	null
2025-06-17	HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search	Qian Xu et.al.	2506.14707	null
2025-06-17	Optimization-Based Image Restoration under Implementation Constraints in Optical Analog Circuits	Taisei Kato et.al.	2506.14624	null
2025-06-17	Unsupervised Imaging Inverse Problems with Diffusion Distribution Matching	Giacomo Meanti et.al.	2506.14605	link
2025-06-17	Exploring Diffusion with Test-Time Training on Efficient Image Restoration	Rongchang Lu et.al.	2506.14541	null
2025-06-17	GrFormer: A Novel Transformer on Grassmann Manifold for Infrared and Visible Image Fusion	Huan Kang et.al.	2506.14384	null
2025-06-18	DREAM: On hallucinations in AI-generated content for nuclear medicine imaging	Menghua Xia et.al.	2506.13995	null
2025-06-16	Robust Recursive Fusion of Multiresolution Multispectral Images with Location-Aware Neural Networks	Haoqing Li et.al.	2506.13733	null
2025-06-16	Exploiting the Exact Denoising Posterior Score in Training-Free Guidance of Diffusion Models	Gregory Bellchambers et.al.	2506.13614	null
2025-06-16	A Semantically-Aware Relevance Measure for Content-Based Medical Image Retrieval Evaluation	Xiaoyang Wei et.al.	2506.13509	null
2025-06-17	Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval	Kshitij Kavimandan et.al.	2506.13496	null
2025-06-16	EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition	Bingxi Liu et.al.	2506.13133	null
2025-06-15	Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution	Hang Xu et.al.	2506.12738	null
2025-06-14	An Iterative PDE Based Illumination Restoration Scheme for Image Enhancement	Dragos-Patru Covei et.al.	2506.12560	null
2025-06-14	UniDet-D: A Unified Dynamic Spectral Attention Model for Object Detection under Adverse Weathers	Yuantao Wang et.al.	2506.12324	null
2025-06-11	Towards a general-purpose foundation model for fMRI analysis	Cheng Wang et.al.	2506.11167	null
2025-06-10	Adaptive Object Detection with ESRGAN-Enhanced Resolution & Faster R-CNN	Divya Swetha K et.al.	2506.11122	null
2025-06-12	FSATFusion: Frequency-Spatial Attention Transformer for Infrared and Visible Image Fusion	Tianpei Zhang et.al.	2506.10366	link
2025-06-11	Improving Personalized Search with Regularized Low-Rank Parameter Updates	Fiona Ryan et.al.	2506.10182	link
2025-06-10	Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment	Tianyu Chen et.al.	2506.10030	link
2025-06-11	Text-Aware Image Restoration with Diffusion Models	Jaewon Min et.al.	2506.09993	null
2025-06-11	Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints	Xiangkai Zhang et.al.	2506.09748	null
2025-06-11	Beyond Calibration: Physically Informed Learning for Raw-to-Raw Mapping	Peter Grönquist et.al.	2506.08650	null
2025-06-09	PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement	Teng Hu et.al.	2506.07848	null
2025-06-09	M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration	Yongzhen Wang et.al.	2506.07814	null
2025-06-09	Design and Evaluation of Deep Learning-Based Dual-Spectrum Image Fusion Methods	Beining Xu et.al.	2506.07779	null
2025-06-08	Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI	Aditya Chakravarty et.al.	2506.07286	null
2025-06-08	A PDE-Based Image Restoration Method: Mathematical Analysis and Implementation	Dragos-Patru Covei et.al.	2506.07132	null
2025-06-07	Zero Shot Composed Image Retrieval	Santhosh Kakarla et.al.	2506.06602	null
2025-06-06	A Deep Learning Approach for Facial Attribute Manipulation and Reconstruction in Surveillance and Reconnaissance	Anees Nashath Shaik et.al.	2506.06578	null
2025-06-06	GenIR: Generative Visual Feedback for Mental Image Retrieval	Diji Yang et.al.	2506.06220	null
2025-06-06	Bidirectional Image-Event Guided Low-Light Image Enhancement	Zhanwen Liu et.al.	2506.06120	null
2025-06-06	NTIRE 2025 Challenge on HR Depth from Images of Specular and Transparent Surfaces	Pierluigi Zama Ramirez et.al.	2506.05815	null
2025-06-05	UniRes: Universal Image Restoration for Complex Degradations	Mo Zhou et.al.	2506.05599	null
2025-06-05	OpenRR-5k: A Large-Scale Benchmark for Reflection Removal in the Wild	Jie Cai et.al.	2506.05482	null
2025-06-05	Degradation-Aware Image Enhancement via Vision-Language Classification	Jie Cai et.al.	2506.05450	null
2025-06-05	SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training	Jianyi Wang et.al.	2506.05301	null
2025-06-05	Physics Informed Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement	Niki Martinel et.al.	2506.04753	null
2025-06-04	A Poisson-Guided Decomposition Network for Extreme Low-Light Image Enhancement	Isha Rao et.al.	2506.04470	null
2025-06-04	WIFE-Fusion:Wavelet-aware Intra-inter Frequency Enhancement for Multi-model Image Fusion	Tianpei Zhang et.al.	2506.03555	null
2025-06-03	NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results	Xiaohong Liu et.al.	2506.02875	null
2025-06-03	ControlMambaIR: Conditional Controls with State-Space Model for Image Restoration	Cheng Yang et.al.	2506.02633	null
2025-06-02	Entity Image and Mixed-Modal Image Retrieval Datasets	Cristian-Ioan Blaga et.al.	2506.02291	null
2025-06-04	NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution	Marcos V. Conde et.al.	2506.02197	null
2025-06-02	RAW Image Reconstruction from RGB on Smartphones. NTIRE 2025 Challenge Report	Marcos V. Conde et.al.	2506.01947	null
2025-06-02	NTIRE 2025 the 2nd Restore Any Image Model (RAIM) in the Wild Challenge	Jie Liang et.al.	2506.01394	null
2025-06-01	Quantization-based Bounds on the Wasserstein Metric	Jonathan Bobrutsky et.al.	2506.00976	null
2025-05-31	Image Restoration Learning via Noisy Supervision in the Fourier Domain	Haosen Liu et.al.	2506.00564	null
2025-05-30	RT-X Net: RGB-Thermal cross attention network for Low-Light Image Enhancement	Raman Jha et.al.	2505.24705	link
2025-05-30	Model-Guided Network with Cluster-Based Operators for Spatio-Spectral Super-Resolution	Ivan Pereira-Sánchez et.al.	2505.24605	link
2025-05-30	SORCE: Small Object Retrieval in Complex Environments	Chunxu Liu et.al.	2505.24441	link
2025-05-30	IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models	Hanting Wang et.al.	2505.24406	link
2025-05-30	Boosting All-in-One Image Restoration via Self-Improved Privilege Learning	Gang Wu et.al.	2505.24207	link
2025-05-29	Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch	Aneeshan Sain et.al.	2505.23763	null
2025-05-29	Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging	Ping Wang et.al.	2505.23180	link
2025-05-29	CURVE: CLIP-Utilized Reinforcement Learning for Visual Image Enhancement via Simple Image Processing	Yuka Ogino et.al.	2505.23102	null
2025-05-29	URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration	Rui Xu et.al.	2505.23068	link
2025-05-29	Vision-Based Assistive Technologies for People with Cerebral Visual Impairment: A Review and Focus Study	Bhanuka Gamage et.al.	2505.22983	null
2025-05-29	EquiReg: Equivariance Regularized Diffusion for Inverse Problems	Bahareh Tolooshams et.al.	2505.22973	null
2025-05-28	From Controlled Scenarios to Real-World: Cross-Domain Degradation Pattern Matching for All-in-One Image Restoration	Junyu Fan et.al.	2505.22284	null
2025-05-28	UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images	Junhuan Liu et.al.	2505.22098	null
2025-05-28	Fast Feature Matching of UAV Images via Matrix Band Reduction-based GPU Data Schedule	San Jiang et.al.	2505.22089	null
2025-05-28	GL-PGENet: A Parameterized Generation Framework for Robust Document Image Enhancement	Zhihong Tang et.al.	2505.22021	null
2025-05-28	Reference-Guided Identity Preserving Face Restoration	Mo Zhou et.al.	2505.21905	null
2025-05-28	Broadening Our View: Assistive Technology for Cerebral Visual Impairment	Bhanuka Gamage et.al.	2505.21875	null
2025-05-27	QuARI: Query Adaptive Retrieval Improvement	Eric Xing et.al.	2505.21647	null
2025-05-27	BaryIR: Learning Multi-Source Unified Representation in Continuous Barycenter Space for Generalizable All-in-One Image Restoration	Xiaole Tang et.al.	2505.21637	null
2025-05-27	Causality-Driven Infrared and Visible Image Fusion	Linli Ma et.al.	2505.20830	null
2025-05-27	ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval	Eric Xing et.al.	2505.20764	link
2025-05-28	See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction	Yuan Wu et.al.	2505.20641	link
2025-05-28	PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy	Shuhao Guan et.al.	2505.20429	null
2025-05-26	Visualized Text-to-Image Retrieval	Di Wu et.al.	2505.20291	link
2025-05-26	Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval	Rong-Cheng Tu et.al.	2505.19952	null
2025-05-26	Can Visual Encoder Learn to See Arrows?	Naoyuki Terashita et.al.	2505.19944	null
2025-05-26	Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement	Afrah Shaahid et.al.	2505.19895	null
2025-05-26	A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking	Zixiang Zhao et.al.	2505.19858	null
2025-05-26	A Regularization-Guided Equivariant Approach for Image Restoration	Yulu Bai et.al.	2505.19799	link
2025-05-26	MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval	Rong-Cheng Tu et.al.	2505.19707	null
2025-05-25	Improving Novel view synthesis of 360 $^\circ$ Scenes in Extremely Sparse Views by Jointly Training Hemisphere Sampled Synthetic Images	Guangan Chen et.al.	2505.19264	link
2025-05-25	Benchmarking Laparoscopic Surgical Image Restoration and Beyond	Jialun Pei et.al.	2505.19161	link
2025-05-25	Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition	Xiaoyang Liu et.al.	2505.19120	link
2025-05-24	Manifold-aware Representation Learning for Degradation-agnostic Image Restoration	Bin Ren et.al.	2505.18679	null
2025-05-23	RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration	Sudarshan Rajagopalan et.al.	2505.18047	null
2025-05-23	DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval	Yuxin Yang et.al.	2505.17796	null
2025-05-23	MODEM: A Morton-Order Degradation Estimation Mechanism for Adverse Weather Image Recovery	Hainuo Wang et.al.	2505.17581	link
2025-05-23	Dual Ascent Diffusion for Inverse Problems	Minseo Kim et.al.	2505.17353	null
2025-05-22	Forward-only Diffusion Probabilistic Models	Ziwei Luo et.al.	2505.16733	link
2025-05-22	Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration	Yuetong Liu et.al.	2505.16479	null
2025-05-22	NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment	Shuhao Han et.al.	2505.16314	null
2025-05-22	Deep Learning-Driven Ultra-High-Definition Image Restoration: A Survey	Liyan Wang et.al.	2505.16161	link
2025-05-22	Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention	Yuang Ai et.al.	2505.16157	null
2025-05-21	Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval	Siting Li et.al.	2505.15877	null
2025-05-21	SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph Retrieval	Nikolaos Chaidos et.al.	2505.15867	link
2025-05-22	Continuous Representation Methods, Theories, and Applications: An Overview and Perspectives	Yisi Luo et.al.	2505.15222	link
2025-05-20	UHD Image Dehazing via anDehazeFormer with Atmospheric-aware KV Cache	Pu Wang et.al.	2505.14010	null
2025-05-20	Multimodal RAG-driven Anomaly Detection and Classification in Laser Powder Bed Fusion using Large Language Models	Kiarash Naghavi Khanghah et.al.	2505.13828	null
2025-05-19	Adaptive Image Restoration for Video Surveillance: A Real-Time Approach	Muhammad Awais Amin et.al.	2505.13130	null
2025-05-19	LatentINDIGO: An INN-Guided Latent Diffusion Algorithm for Image Restoration	Di You et.al.	2505.12935	null
2025-05-19	Towards a Universal Image Degradation Model via Content-Degradation Disentanglement	Wenbo Yang et.al.	2505.12860	null
2025-05-19	Degradation-Aware Feature Perturbation for All-in-One Image Restoration	Xiangpeng Tian et.al.	2505.12630	link
2025-05-18	Trustworthy Image Super-Resolution via Generative Pseudoinverse	Andreas Floros et.al.	2505.12375	link
2025-05-18	SMFusion: Semantic-Preserving Fusion of Multimodal Medical Images for Enhanced Clinical Diagnosis	Haozhe Xiang et.al.	2505.12251	null
2025-05-17	Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model	Jian Zhu et.al.	2505.11800	link
2025-05-16	Improved Bag-of-Words Image Retrieval with Geometric Constraints for Ground Texture Localization	Aaron Wilhelm et.al.	2505.11620	null
2025-05-16	Diff-Unfolding: A Model-Based Score Learning Framework for Inverse Problems	Yuanhao Wang et.al.	2505.11393	null
2025-05-16	Entropy-Driven Genetic Optimization for Deep-Feature-Guided Low-Light Image Enhancement	Nirjhor Datta et.al.	2505.11246	link
2025-05-16	Redundancy-Aware Pretraining of Vision-Language Foundation Models in Remote Sensing	Mathis Jürgen Adler et.al.	2505.11121	null
2025-05-15	torchmfbd: a flexible multi-object multi-frame blind deconvolution code	A. Asensio Ramos et.al.	2505.10639	link
2025-05-19	Super-Resolution Generative Adversarial Networks based Video Enhancement	Kağan ÇETİN et.al.	2505.10589	null
2025-05-14	PDE: Gene Effect Inspired Parameter Dynamic Evolution for Low-light Image Enhancement	Tong Li et.al.	2505.09196	null
2025-05-13	Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations	Petrus H. Zwart et.al.	2505.08176	null
2025-05-12	Image Restoration via Integration of Optimal Control Techniques and the Hamilton-Jacobi-Bellman Equation	Dragos-Patru Covei et.al.	2505.07699	null
2025-05-12	Generalizable Pancreas Segmentation via a Dual Self-Supervised Learning Framework	Jun Li et.al.	2505.07165	null
2025-05-11	Bi-directional Self-Registration for Misaligned Infrared-Visible Image Fusion	Timing Li et.al.	2505.06920	null
2025-05-10	UnfoldIR: Rethinking Deep Unfolding Network in Illumination Degradation Image Restoration	Chunming He et.al.	2505.06683	null
2025-05-10	MultiTaskVIF: Segmentation-oriented visible and infrared image fusion via multi-task learning	Zixian Zhao et.al.	2505.06665	null
2025-05-09	A review of advancements in low-light image enhancement using deep learning	Fangxue Liu et.al.	2505.05759	null
2025-05-08	Semantic Style Transfer for Enhancing Animal Facial Landmark Detection	Anadil Hussein et.al.	2505.05640	null
2025-05-08	A Preliminary Study for GPT-4o on Image Restoration	Hao Yang et.al.	2505.05621	link
2025-05-07	Image Restoration via Multi-domain Learning	Xingyu Jiang et.al.	2505.05504	link
2025-05-08	SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation	Yonwoo Choi et.al.	2505.05475	link
2025-05-08	EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution	Haizhen Xie et.al.	2505.05209	null
2025-05-08	ADNP-15: An Open-Source Histopathological Dataset for Neuritic Plaque Segmentation in Human Brain Whole Slide Images with Frequency Domain Image Enhancement for Stain Normalization	Chenxi Zhao et.al.	2505.05041	null
2025-05-07	DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at Once	Qi Zhou et.al.	2505.04526	link
2025-05-08	HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation	Teng Hu et.al.	2505.04512	null
2025-05-07	TS-Diff: Two-Stage Diffusion Model for Low-Light RAW Image Enhancement	Yi Li et.al.	2505.04281	link
2025-05-07	Regional chemical potential analysis for material surfaces	Masahiro Fukuda et.al.	2505.04053	null
2025-05-04	OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery	Chongsheng Zhang et.al.	2505.03836	link
2025-05-06	DDaTR: Dynamic Difference-aware Temporal Residual Network for Longitudinal Radiology Report Generation	Shanshan Song et.al.	2505.03401	link
2025-05-06	Seeing the Abstract: Translating the Abstract Language for Vision Language Models	Davide Talon et.al.	2505.03242	link
2025-05-05	MSFNet-CPD: Multi-Scale Cross-Modal Fusion Network for Crop Pest Detection	Jiaqi Zhang et.al.	2505.02441	link
2025-05-05	Quaternion Multi-focus Color Image Fusion	Weihua Yang et.al.	2505.02365	null
2025-05-05	Quaternion Infrared Visible Image Fusion	Weihua Yang et.al.	2505.02364	null
2025-05-04	HiLLIE: Human-in-the-Loop Training for Low-Light Image Enhancement	Xiaorui Zhao et.al.	2505.02134	null
2025-05-03	ImageR: Enhancing Bug Report Clarity by Screenshots	Xuchen Tan et.al.	2505.01925	null
2025-05-03	Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement	Haofan Wu et.al.	2505.01831	null
2025-05-02	Deblurring fission fragment mass distributions	Pierre Nzabahimana et.al.	2505.01294	null
2025-05-02	RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement	Kui Jiang et.al.	2505.01224	link
2025-05-01	GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution	Aditya Arora et.al.	2505.00687	null
2025-04-30	DGSolver: Diffusion Generalist Solver with Universal Posterior Sampling for Image Restoration	Hebaixu Wang et.al.	2504.21487	link
2025-04-30	VR-FuseNet: A Fusion of Heterogeneous Fundus Data and Explainable Deep Network for Diabetic Retinopathy Classification	Shamim Rahim Refat et.al.	2504.21464	null
2025-04-29	Spatial-enhanced Reflective Coded Aperture Snapshot Spectral Imaging	Jiayu Di et.al.	2504.20516	null
2025-04-29	TTTFusion: A Test-Time Training-Based Strategy for Multimodal Medical Image Fusion in Surgical Robots	Qinhua Xie et.al.	2504.20362	null
2025-04-27	FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement	Kangbiao Shi et.al.	2504.19295	null
2025-04-27	Marine Snow Removal Using Internally Generated Pseudo Ground Truth	Alexandra Malyugina et.al.	2504.19289	null
2025-04-27	Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting	Xiaofeng Jin et.al.	2504.19261	null
2025-04-27	Adaptive Dual-domain Learning for Underwater Image Enhancement	Lingtao Peng et.al.	2504.19198	link
2025-04-27	DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning	Jialang Lu et.al.	2504.19127	null
2025-04-25	From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval	Yabing Wang et.al.	2504.17990	null
2025-04-24	Dual Prompting Image Restoration with Diffusion Transformers	Dehong Kong et.al.	2504.17825	null
2025-04-24	DPMambaIR:All-in-One Image Restoration via Degradation-Aware Prompt State Space Model	Zhanwen Liu et.al.	2504.17732	null
2025-04-24	Inverse-Designed Metasurfaces for Wavefront Restoration in Under-Display Camera Systems	Jaegang Jo et.al.	2504.17368	null
2025-04-24	I-INR: Iterative Implicit Neural Representations	Ali Haider et.al.	2504.17364	null
2025-04-23	Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval	Xin Jiang et.al.	2504.16691	null
2025-04-23	RouteWinFormer: A Route-Window Transformer for Middle-range Attention in Image Restoration	Qifan Li et.al.	2504.16637	null
2025-04-23	Cross Paradigm Representation and Alignment Transformer for Image Deraining	Shun Zou et.al.	2504.16455	null
2025-04-22	Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs	Merve Cerit et.al.	2504.16323	link
2025-04-22	AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization	Jinda Lu et.al.	2504.15619	null
2025-04-22	SonarT165: A Large-scale Benchmark and STFTrack Framework for Acoustic Object Tracking	Yunfeng Li et.al.	2504.15609	link
2025-04-22	InstaRevive: One-Step Image Enhancement via Dynamic Score Matching	Yixuan Zhu et.al.	2504.15513	null
2025-04-21	Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration	Junyuan Deng et.al.	2504.15159	null
2025-04-21	Structure-guided Diffusion Transformer for Low-Light Image Enhancement	Xiangchen Yin et.al.	2504.15054	null
2025-04-21	Distribution-aware Dataset Distillation for Efficient Image Restoration	Zhuoran Zheng et.al.	2504.14826	null
2025-04-19	A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling	Kyle Buettner et.al.	2504.14359	null
2025-04-19	Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation	Bin Ren et.al.	2504.14249	null
2025-04-18	Towards Scale-Aware Low-Light Enhancement via Structure-Guided Transformer Design	Wei Dong et.al.	2504.14075	link
2025-04-18	Zebrafish Counting Using Event Stream Data	Qianghua Chen et.al.	2504.13692	null
2025-04-21	Circular Image Deturbulence using Quasi-conformal Geometry	Chu Chen et.al.	2504.13432	null
2025-04-17	SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Haoxuan Li et.al.	2504.13172	null
2025-04-17	Saliency-Aware Diffusion Reconstruction for Effective Invisible Watermark Removal	Inzamamul Alam et.al.	2504.12809	link
2025-04-17	AdaQual-Diff: Diffusion-Based Image Restoration via Adaptive Quality Prompting	Xin Su et.al.	2504.12605	null
2025-04-16	Towards Realistic Low-Light Image Enhancement via ISP Driven Data Modeling	Zhihua Wang et.al.	2504.12204	link
2025-04-16	Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging	Tristan S. W. Stevens et.al.	2504.12154	null
2025-04-16	Generalized Visual Relation Detection with Diffusion Models	Kaifeng Gao et.al.	2504.12100	null
2025-04-16	R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors	Haoyang Wang et.al.	2504.11946	null
2025-04-16	Learning Physics-Informed Color-Aware Transforms for Low-Light Image Enhancement	Xingxing Yang et.al.	2504.11896	null
2025-04-16	HyperKING: Quantum-Classical Generative Adversarial Networks for Hyperspectral Image Restoration	Chia-Hsiang Lin et.al.	2504.11782	null
2025-04-15	Efficient Medical Image Restoration via Reliability Guided Learning in Frequency Domain	Pengcheng Zheng et.al.	2504.11286	null
2025-04-15	Enhanced Small Target Detection via Multi-Modal Fusion and Attention Mechanisms: A YOLOv5 Approach	Xiaoxiao Ma et.al.	2504.11262	null
2025-04-15	Visual Re-Ranking with Non-Visual Side Information	Gustav Hanning et.al.	2504.11134	link
2025-04-15	UKDM: Underwater keypoint detection and matching using underwater image enhancement techniques	Pedro Diaz-Garcia et.al.	2504.11063	null
2025-04-15	TMCIR: Token Merge Benefits Composed Image Retrieval	Chaoyang Wang et.al.	2504.10995	null
2025-04-15	AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent	Pu Wang et.al.	2504.10978	null
2025-04-15	An Efficient and Mixed Heterogeneous Model for Image Restoration	Yubin Gu et.al.	2504.10967	link
2025-04-15	DAAF:Degradation-Aware Adaptive Fusion Framework for Robust Infrared and Visible Images Fusion	Tianpei Zhang et.al.	2504.10871	null
2025-04-14	PG-DPIR: An efficient plug-and-play method for high-count Poisson-Gaussian inverse problems	Maud Biquard et.al.	2504.10375	null
2025-04-14	Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis	Kaiwen Zheng et.al.	2504.10351	null
2025-04-14	VibrantLeaves: A principled parametric image generator for training deep restoration models	Raphael Achddou et.al.	2504.10201	link
2025-04-14	Learning to Harmonize Cross-vendor X-ray Images by Non-linear Image Dynamics Correction	Yucheng Lu et.al.	2504.10080	null
2025-04-14	Progressive Transfer Learning for Multi-Pass Fundus Image Restoration	Uyen Phan et.al.	2504.10025	null
2025-04-14	Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration	Gang Wu et.al.	2504.09973	link
2025-04-14	Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition	Changwei Wang et.al.	2504.09881	link
2025-04-13	Computationally iterative methods for salt-and-pepper denoising	Jianwei Ke et.al.	2504.09408	null
2025-04-13	Low-Light Image Enhancement using Event-Based Illumination Estimation	Lei Sun et.al.	2504.09379	null
2025-04-12	Beyond Degradation Conditions: All-in-One Image Restoration via HOG Transformers	Jiawei Wu et.al.	2504.09377	link
2025-04-11	Hypergraph Vision Transformers: Images are More than Nodes, More than Edges	Joshua Fixelle et.al.	2504.08710	null
2025-04-11	ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration	Yongsheng Yu et.al.	2504.08591	null
2025-04-11	FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations	Cheng-Yu Hsieh et.al.	2504.08368	null
2025-04-11	DreamFuse: Adaptive Image Fusion with Diffusion Transformer	Junjia Huang et.al.	2504.08291	null
2025-04-11	VL-UR: Vision-Language-guided Universal Restoration of Images Degraded by Adverse Weather Conditions	Ziyan Liu et.al.	2504.08219	null
2025-04-10	Nonlocal Retinex-Based Variational Model and its Deep Unfolding Twin for Low-Light Image Enhancement	Daniel Torres et.al.	2504.07810	null
2025-04-10	Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval	Zehong Ma et.al.	2504.07718	null
2025-04-10	Multi-Modal Data Fusion for Moisture Content Prediction in Apple Drying	Shichen Li et.al.	2504.07465	null
2025-04-10	Synthetic CT Generation from Time-of-Flight Non-Attenutaion-Corrected PET for Whole-Body PET Attenuation Correction	Weijie Chen et.al.	2504.07450	null
2025-04-09	Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model	Yingjie Zhou et.al.	2504.07148	null
2025-04-09	Distilling Textual Priors from LLM to Efficient Image Fusion	Ran Zhang et.al.	2504.07029	link
2025-04-09	Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception	Ruotian Peng et.al.	2504.06666	null
2025-04-09	Rethinking LayerNorm in Image Restoration Transformers	MinKyu Lee et.al.	2504.06629	null
2025-04-08	AstroClearNet: Deep image prior for multi-frame astronomical image restoration	Yashil Sukurdeep et.al.	2504.06463	null
2025-04-09	Robust Fusion Controller: Degradation-aware Image Fusion with Fine-grained Language Instructions	Hao Zhang et.al.	2504.05795	null
2025-04-07	Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion	Xingyu Hu et.al.	2504.05164	null
2025-04-07	DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration	Jiamei Xiong et.al.	2504.05135	null
2025-04-08	Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision	Yuandong Pu et.al.	2504.04903	null
2025-04-07	Content-Aware Transformer for All-in-one Image Restoration	Gang Wu et.al.	2504.04869	link
2025-04-07	Inland Waterway Object Detection in Multi-environment: Dataset and Approach	Shanshan Wang et.al.	2504.04835	null
2025-04-06	NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval	Peng Gao et.al.	2504.04339	null
2025-04-05	JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration	Yunlong Lin et.al.	2504.04158	null
2025-04-04	Multimodal Diffusion Bridge with Attention-Based SAR Fusion for Satellite Image Cloud Removal	Yuyang Hu et.al.	2504.03607	null
2025-04-04	REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval	Shabnam Choudhury et.al.	2504.03169	null
2025-04-04	Finding the Reflection Point: Unpadding Images to Remove Data Augmentation Artifacts in Large Open Source Image Datasets for Machine Learning	Lucas Choi et.al.	2504.03168	null
2025-04-03	RoSMM: A Robust and Secure Multi-Modal Watermarking Framework for Diffusion Models	ZhongLi Fang et.al.	2504.02640	null
2025-04-03	Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement	Hesong Li et.al.	2504.02555	link
2025-04-03	HPGN: Hybrid Priors-Guided Network for Compressed Low-Light Image Enhancement	Hantang Li et.al.	2504.02373	null
2025-04-03	Brightness Perceiving for Recursive Low-Light Image Enhancement	Haodian Wang et.al.	2504.02362	link
2025-04-03	SemiISP/SemiIE: Semi-Supervised Image Signal Processor and Image Enhancement Leveraging One-to-Many Mapping sRGB-to-RAW	Masakazu Yoshimura et.al.	2504.02345	null
2025-04-02	Bridge the Gap between SNN and ANN for Image Restoration	Xin Su et.al.	2504.01755	null
2025-04-02	Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval	Yuji Nozawa et.al.	2504.01348	null
2025-04-01	IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval	Bangwei Liu et.al.	2504.00954	null
2025-04-01	Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data	Yiqun Duan et.al.	2504.00812	null
2025-04-01	Deconver: A Deconvolutional Network for Medical Image Segmentation	Pooya Ashtari et.al.	2504.00302	link
2025-03-31	InstructRestore: Region-Customized Image Restoration with Human Instructions	Shuaizheng Liu et.al.	2503.24357	link
2025-03-31	CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization	Yingrui Ji et.al.	2503.24182	null
2025-03-31	3D Dental Model Segmentation with Geometrical Boundary Preserving	Shufan Xi et.al.	2503.23702	link
2025-03-30	Multiview Image-Based Localization	Cameron Fiore et.al.	2503.23577	null
2025-03-30	ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts	Linfeng Tang et.al.	2503.23356	null
2025-03-30	DSPFusion: Image Fusion via Degradation and Semantic Dual-Prior Guidance	Linfeng Tang et.al.	2503.23355	null
2025-03-29	A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery	Pengyu Chen et.al.	2503.23200	null
2025-03-29	indiSplit: Bringing Severity Cognizance to Image Decomposition in Fluorescence Microscopy	Ashesh Ashesh et.al.	2503.22983	null
2025-03-28	RELD: Regularization by Latent Diffusion Models for Image Restoration	Pasquale Cascarano et.al.	2503.22563	null
2025-03-27	Q-MambaIR: Accurate Quantized Mamba for Efficient Image Restoration	Yujie Chen et.al.	2503.21970	null
2025-03-27	LOCORE: Image Re-ranking with Long-Context Sequence Modeling	Zilin Xiao et.al.	2503.21772	link
2025-03-27	Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck	Adrian Bulat et.al.	2503.21757	null
2025-03-27	Invert2Restore: Zero-Shot Degradation-Blind Image Restoration	Hamadi Chihaoui et.al.	2503.21486	null
2025-03-27	Diffusion Image Prior	Hamadi Chihaoui et.al.	2503.21410	null
2025-03-27	FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval	Zixu Li et.al.	2503.21309	link
2025-03-27	Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing	Shuai Li et.al.	2503.21236	null
2025-03-26	Underwater Image Enhancement by Convolutional Spiking Neural Networks	Vidya Sudevan et.al.	2503.20485	link
2025-03-26	Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration	Shihao Zhou et.al.	2503.20174	null
2025-03-25	CoLLM: A Large Language Model for Composed Image Retrieval	Chuong Huynh et.al.	2503.19910	link
2025-03-25	LENVIZ: A High-Resolution Low-Exposure Night Vision Benchmark Dataset	Manjushree Aithal et.al.	2503.19804	null
2025-03-25	Scene-agnostic Pose Regression for Visual Localization	Junwei Zheng et.al.	2503.19543	null
2025-03-25	From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting	Zhiwei Huang et.al.	2503.19358	null
2025-03-25	Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval	Haoqiang Lin et.al.	2503.19296	link
2025-03-24	LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment	Haoran Wang et.al.	2503.18640	null
2025-03-24	OCCO: LVM-guided Infrared and Visible Image Fusion Framework based on Object-aware and Contextual COntrastive Learning	Hui Li et.al.	2503.18635	null
2025-03-24	Dig2DIG: Dig into Diffusion Information Gains for Image Fusion	Bing Cao et.al.	2503.18627	null
2025-03-24	Exploring State Space Model in Wavelet Domain: An Infrared and Visible Image Fusion Network via Wavelet Transform and State Space Model	Tianpei Zhang et.al.	2503.18378	null
2025-03-23	LocDiffusion: Identifying Locations on Earth by Diffusing in the Hilbert Space	Zhangyu Wang et.al.	2503.18142	null
2025-03-23	Deep Learning Assisted Denoising of Experimental Micrographs	Owais Ahmad et.al.	2503.17945	null
2025-03-23	Cross-Domain Underwater Image Enhancement Guided by No-Reference Image Quality Assessment: A Transfer Learning Approach	Zhi Zhang et.al.	2503.17937	null
2025-03-23	Cat-AIR: Content and Task-Aware All-in-One Image Restoration	Jiachen Jiang et.al.	2503.17915	null
2025-03-23	What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images	Dongheng Lin et.al.	2503.17899	null
2025-03-22	good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval	Pranavi Kolouju et.al.	2503.17871	null
2025-03-21	Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval	Yuanmin Tang et.al.	2503.17109	link
2025-03-21	Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks	Haijin Zeng et.al.	2503.16930	null
2025-03-20	Efficient Bayesian Computation Using Plug-and-Play Priors for Poisson Inverse Problems	Teresa Klatzer et.al.	2503.16222	null
2025-03-20	3-D Image-to-Image Fusion in Lightsheet Microscopy by Two-Step Adversarial Network: Contribution to the FuseMyCells Challenge	Marek Wodzinski et.al.	2503.16075	null
2025-03-20	PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Qiang Zou et.al.	2503.16064	link
2025-03-20	Automating 3D Dataset Generation with Neural Radiance Fields	P. Schulz et.al.	2503.15997	link
2025-03-20	DIPLI: Deep Image Prior Lucky Imaging for Blind Astronomical Image Restoration	Suraj Singh et.al.	2503.15984	null
2025-03-21	UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations	Debabrata Mandal et.al.	2503.15868	null
2025-03-19	Image Restoration Models with Optimal Transport and Total Variation Regularization	Weijia Huang et.al.	2503.14947	null
2025-03-19	MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance	Zihan Cao et.al.	2503.14944	null
2025-03-19	Degradation Alchemy: Self-Supervised Unknown-to-Known Transformation for Blind Hyperspectral Image Fusion	He Huang et.al.	2503.14892	null
2025-03-18	Revisiting Image Fusion for Multi-Illuminant White-Balance Correction	David Serrano-Lozano et.al.	2503.14774	null
2025-03-18	SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model	Yucheng Mao et.al.	2503.14463	null
2025-03-18	AI-Driven Diabetic Retinopathy Diagnosis Enhancement through Image Processing and Salp Swarm Algorithm-Optimized Ensemble Network	Saif Ur Rehman Khan et.al.	2503.14209	null
2025-03-18	Towards properties of adversarial image perturbations	Egor Kuznetsov et.al.	2503.14111	null
2025-03-18	Intra and Inter Parser-Prompted Transformers for Effective Image Restoration	Cong Wang et.al.	2503.14037	link
2025-03-17	Scale Efficient Training for Large Datasets	Qing Zhou et.al.	2503.13385	link
2025-03-17	From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective	Chen Zhao et.al.	2503.13165	null
2025-03-17	All You Need to Know About Training Image Retrieval Models	Gabriele Berton et.al.	2503.13045	link
2025-03-17	Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion	Yidi Liu et.al.	2503.12764	null
2025-03-16	DPF-Net: Physical Imaging Model Embedded Data-Driven Underwater Image Enhancement	Han Mei et.al.	2503.12470	link
2025-03-16	Pathology Image Restoration via Mixture of Prompts	Jiangdong Cai et.al.	2503.12399	link
2025-03-14	Advancements in Real-Time Oncology Diagnosis: Harnessing AI and Image Fusion Techniques	Leila Bagheriye et.al.	2503.11332	null
2025-03-14	Breaking Shallow Limits: Task-Driven Pixel Fusion for Gap-free RGBT Tracking	Andong Lu et.al.	2503.11247	null
2025-03-14	Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption	Du Chen et.al.	2503.11221	null
2025-03-14	InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences	Hongkai Zheng et.al.	2503.11043	null
2025-03-13	ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning	Pengfei Luo et.al.	2503.10166	link
2025-03-13	Hybrid Agents for Image Restoration	Bingchen Li et.al.	2503.10120	null
2025-03-13	Dream-IF: Dynamic Relative EnhAnceMent for Image Fusion	Xingxin Xu et.al.	2503.10109	null
2025-03-12	FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target Classification	Shoaib Meraj Sami et.al.	2503.09873	null
2025-03-12	Multi-Agent Image Restoration	Xu Jiang et.al.	2503.09403	null
2025-03-12	Revisiting Medical Image Retrieval via Knowledge Consolidation	Yang Nan et.al.	2503.09370	null
2025-03-12	MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration	Zhehui Wu et.al.	2503.09131	link
2025-03-12	Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal	Rongxin Liao et.al.	2503.09013	link
2025-03-11	QUIET-SR: Quantum Image Enhancement Transformer for Single Image Super-Resolution	Siddhant Dutta et.al.	2503.08759	null
2025-03-11	Language-Depth Navigated Thermal and Visible Image Fusion	Jinchang Zhang et.al.	2503.08676	null
2025-03-11	PromptLNet: Region-Adaptive Aesthetic Enhancement via Prompt Guidance in Low-Light Enhancement Net	Jun Yin et.al.	2503.08276	null
2025-03-11	TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement	Miao Zhang et.al.	2503.08168	null
2025-03-11	Few-Shot Class-Incremental Model Attribution Using Learnable Representation From CLIP-ViT Features	Hanbyul Lee et.al.	2503.08148	null
2025-03-11	Deep Perceptual Enhancement for Medical Image Analysis	S M A Sharif et.al.	2503.08027	link
2025-03-10	GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts	Minwen Liao et.al.	2503.07417	null
2025-03-10	Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion	Haowen Bai et.al.	2503.07235	null
2025-03-11	Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios	Chenglu Pan et.al.	2503.07232	null
2025-03-10	Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization	Michael Green et.al.	2503.07038	null
2025-03-10	Zero-Shot Hashing Based on Reconstruction With Part Alignment	Yan Jiang et.al.	2503.07037	null
2025-03-10	Learning a Unified Degradation-aware Representation Model for Multi-modal Image Fusion	Haolong Ma et.al.	2503.07033	null
2025-03-10	MERLION: Marine ExploRation with Language guIded Online iNformative Visual Sampling and Enhancement	Shrutika Vishal Thengane et.al.	2503.06953	link
2025-03-09	RoboDesign1M: A Large-scale Dataset for Robot Design Understanding	Tri Le et.al.	2503.06796	null
2025-03-09	StructVPR++: Distill Structural and Semantic Knowledge with Weighting Samples for Visual Place Recognition	Yanqing Shen et.al.	2503.06601	link
2025-03-07	Data-Efficient Generalization for Zero-shot Composed Image Retrieval	Zining Chen et.al.	2503.05204	null
2025-03-06	RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining	Tengfei Zhang et.al.	2503.04653	null
2025-03-06	Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior	Haitao Wu et.al.	2503.04207	link
2025-03-05	An Adaptive Underwater Image Enhancement Framework via Multi-Domain Fusion and Color Compensation	Yuezhe Tian et.al.	2503.03640	null
2025-03-05	Mineral segmentation using electron microscope images and spectral sampling through multimodal graph neural networks	Samuel Repka et.al.	2503.03507	null
2025-03-05	Two-Stream Thermal Imaging Fusion for Enhanced Time of Birth Detection in Neonatal Care	Jorge García-Torres et.al.	2503.03244	null
2025-03-03	Hyperspectral Image Restoration and Super-resolution with Physics-Aware Deep Learning for Biomedical Applications	Yuchen Xiang et.al.	2503.02908	null
2025-03-04	ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement	Xuejian Guo et.al.	2503.02484	link
2025-03-04	Semantic Prior Distillation with Vision Foundation Model for Enhanced Rapid Bone Scintigraphy Image Restoration	Pengchen Liang et.al.	2503.02321	null
2025-03-03	MRI super-resolution reconstruction using efficient diffusion probabilistic model with residual shifting	Mojtaba Safari et.al.	2503.01576	link
2025-03-03	Wavelet-Enhanced Desnowing: A Novel Single Image Restoration Approach for Traffic Surveillance under Adverse Weather Conditions	Zihan Shen et.al.	2503.01339	null
2025-03-03	Composed Multi-modal Retrieval: A Survey of Approaches and Applications	Kun Zhang et.al.	2503.01334	link
2025-03-03	Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual	Chong Wang et.al.	2503.01288	link
2025-03-03	Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond	Guanyao Wu et.al.	2503.01210	null
2025-03-02	Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion	Daiki Nishiyama et.al.	2503.00925	null
2025-03-01	Self-supervision via Controlled Transformation and Unpaired Self-conditioning for Low-light Image Enhancement	Aupendu Kar et.al.	2503.00642	link
2025-03-01	Class-Independent Increment: An Efficient Approach for Multi-label Class-Incremental Learning	Songlin Dong et.al.	2503.00515	null
2025-02-28	SEE: See Everything Every Time – Adaptive Brightness Adjustment for Broad Light Range Images via Events	Yunfan Lu et.al.	2502.21120	null
2025-02-28	CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval	Zelong Sun et.al.	2502.20826	null
2025-02-28	Diffusion Restoration Adapter for Real-World Image Restoration	Hanbang Liang et.al.	2502.20679	null
2025-02-28	HVI: A New Color Space for Low-light Image Enhancement	Qingsen Yan et.al.	2502.20272	link
2025-02-27	Night-Voyager: Consistent and Efficient Nocturnal Vision-Aided State Estimation in Object Maps	Tianxiao Gao et.al.	2502.20054	null
2025-02-27	Striving for Faster and Better: A One-Layer Architecture with Auto Re-parameterization for Low-Light Image Enhancement	Nan An et.al.	2502.19867	null
2025-02-27	One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion	Chunyang Cheng et.al.	2502.19854	link
2025-02-26	ILACS-LGOT: A Multi-Layer Contrast Enhancement Approach for Palm-Vein Images	Kaveen Perera et.al.	2502.19456	null
2025-02-27	On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation	Ruben T. Lucassen et.al.	2502.19285	null
2025-02-26	Self-supervised conformal prediction for uncertainty quantification in Poisson imaging problems	Bernardin Tamo Amougou et.al.	2502.19194	null
2025-02-26	Multi-level Attention-guided Graph Neural Network for Image Restoration	Jiatao Jiang et.al.	2502.19181	null
2025-02-27	RetinaRegen: A Hybrid Model for Readability and Detail Restoration in Fundus Images	Yuhan Tang et.al.	2502.19153	null
2025-02-26	Dynamic Degradation Decomposition Network for All-in-One Image Restoration	Huiqiang Wang et.al.	2502.19068	null
2025-02-25	Spatial Analysis of Neuromuscular Junctions Activation in Three-Dimensional Histology-based Muscle Reconstructions	Alessandro Ascani Orsini et.al.	2502.18646	link
2025-02-24	Splitting Regularized Wasserstein Proximal Algorithms for Nonsmooth Sampling Problems	Fuqun Han et.al.	2502.16773	link
2025-02-23	Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries	Yin Wu et.al.	2502.16636	link
2025-02-21	Improved Partial Differential Equation and Fast Approximation Algorithm for Hazy/Underwater/Dust Storm Image Enhancement	Uche A. Nnolim et.al.	2502.15986	null
2025-02-21	ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval	Guanqi Zhan et.al.	2502.15682	null
2025-02-21	LUMINA-Net: Low-light Upgrade through Multi-stage Illumination and Noise Adaptation Network for Image Enhancement	Namrah Siddiqua et.al.	2502.15186	null
2025-02-21	Optimized Pap Smear Image Enhancement: Hybrid PMD Filter-CLAHE Using Spider Monkey Optimization	Ach Khozaimi et.al.	2502.15156	null
2025-02-20	Reinforcement Learning for Ultrasound Image Analysis A Comprehensive Review of Advances and Applications	Maha Ezzelarab et.al.	2502.14995	null
2025-02-20	CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond	Yukai Shi et.al.	2502.14493	null
2025-02-20	EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement	Wenhui Zhu et.al.	2502.14260	null
2025-02-19	RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior	Ching-Hua Lee et.al.	2502.13574	null
2025-02-18	Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization	Shuo Xing et.al.	2502.13146	link
2025-02-18	Local Flaw Detection with Adaptive Pyramid Image Fusion Across Spatial Sampling Resolution for SWRs	Siyu You et.al.	2502.12512	null
2025-02-17	Descriminative-Generative Custom Tokens for Vision-Language Models	Pramuditha Perera et.al.	2502.12095	null
2025-02-17	ILIAS: Instance-Level Image retrieval At Scale	Giorgos Kordopatis-Zilos et.al.	2502.11748	null
2025-02-17	Adversarially Robust CLIP Models Can Induce Better (Robust) Perceptual Metrics	Francesco Croce et.al.	2502.11725	link
2025-02-17	Precise GPS-Denied UAV Self-Positioning via Context-Enhanced Cross-View Geo-Localization	Yuanze Xu et.al.	2502.11408	null
2025-02-12	E2LVLM:Evidence-Enhanced Large Vision-Language Model for Multimodal Out-of-Context Misinformation Detection	Junjie Wu et.al.	2502.10455	null
2025-02-19	Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal	Jinpei Guo et.al.	2502.09873	link
2025-02-13	Source function from two-particle correlation function through entropy-regularized Richardson-Lucy deblurring	C. K. Tam et.al.	2502.09478	null
2025-02-13	ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation	Rotem Shalev-Arkushin et.al.	2502.09411	null
2025-02-12	Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions	Prajwal Gatti et.al.	2502.08438	null
2025-02-13	MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers	Ao Li et.al.	2502.07856	null
2025-02-11	Captured by Captions: On Memorization and its Mitigation in CLIP Models	Wenhao Wang et.al.	2502.07830	null
2025-02-11	Multi-Task-oriented Nighttime Haze Imaging Enhancer for Vision-driven Measurement Systems	Ai Chen et.al.	2502.07351	link
2025-02-11	Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos	Haowen Gao et.al.	2502.07327	null
2025-02-11	PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval	Osman Tursun et.al.	2502.07215	null
2025-02-10	AstroLoc: Robust Space to Ground Image Localizer	Gabriele Berton et.al.	2502.07003	null
2025-02-10	UniDemoiré: Towards Universal Image Demoiréing with Data Generation and Synthesis	Zemin Yang et.al.	2502.06324	null
2025-02-09	A Comprehensive Survey on Image Signal Processing Approaches for Low-Illumination Image Enhancement	Muhammad Turab et.al.	2502.05995	null
2025-02-09	Uni-Retrieval: A Multi-Style Retrieval Framework for STEM’s Education	Yanhao Jia et.al.	2502.05863	null
2025-02-11	UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control	Kaizhen Zhu et.al.	2502.05749	link
2025-02-07	Self-supervised Conformal Prediction for Uncertainty Quantification in Imaging Problems	Jasper M. Everink et.al.	2502.05127	null
2025-02-07	Performance Evaluation of Image Enhancement Techniques on Transfer Learning for Touchless Fingerprint Recognition	S Sreehari et.al.	2502.04680	null
2025-02-07	HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion	Mengting Ma et.al.	2502.04623	null
2025-02-06	Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion	Marco Mistretta et.al.	2502.04263	link
2025-02-05	All-in-One Image Compression and Restoration	Huimin Zeng et.al.	2502.03649	link
2025-02-05	Efficient Image Restoration via Latent Consistency Flow Matching	Elad Cohen et.al.	2502.03500	null
2025-02-05	Human-Aligned Image Models Improve Visual Decoding from the Brain	Nona Rajabi et.al.	2502.03081	null
2025-02-04	Blind Visible Watermark Removal with Morphological Dilation	Preston K. Robinette et.al.	2502.02676	null
2025-02-04	MATCNN: Infrared and Visible Image Fusion Method Based on Multi-scale CNN with Attention Transformer	Jingjing Liu et.al.	2502.01959	link
2025-02-03	Deep Unfolding Multi-modal Image Fusion Network via Attribution Analysis	Haowen Bai et.al.	2502.01467	null
2025-02-03	Human Body Restoration with One-Step Diffusion Model and A New Benchmark	Jue Gong et.al.	2502.01411	null
2025-02-03	ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies	Costin F. Ciusdel et.al.	2502.01335	null
2025-02-04	Compressed Image Generation with Denoising Diffusion Codebook Models	Guy Ohayon et.al.	2502.01189	null
2025-02-01	A framework for river connectivity classification using temporal image processing and attention based neural networks	Timothy James Becker et.al.	2502.00474	null
2025-02-01	Shape from Semantics: 3D Shape Generation from Multi-View Semantics	Liangchen Li et.al.	2502.00360	null
2025-01-31	Deep Ensembling with Multimodal Image Fusion for Efficient Classification of Lung Cancer	Surochita Pal et.al.	2502.00078	null
2025-01-30	Integrating Spatial and Frequency Information for Under-Display Camera Image Restoration	Kyusu Ahn et.al.	2501.18517	null
2025-01-31	MatIR: A Hybrid Mamba-Transformer Image Restoration Model	Juan Wen et.al.	2501.18401	link
2025-01-30	Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers	Malte Tölle et.al.	2501.18237	null
2025-01-29	Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment	Zixue Zeng et.al.	2501.17690	link
2025-01-28	Text-to-Image Generation for Vocabulary Learning Using the Keyword Method	Nuwan T. Attygalle et.al.	2501.17099	null
2025-01-27	Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration	Long Peng et.al.	2501.16583	null
2025-01-27	UDBE: Unsupervised Diffusion-based Brightness Enhancement in Underwater Images	Tatiana Taís Schein et.al.	2501.16211	link
2025-01-27	Freestyle Sketch-in-the-Loop Image Segmentation	Subhadeep Koley et.al.	2501.16022	null
2025-01-27	CausalSR: Structural Causal Model-Driven Super-Resolution with Counterfactual Inference	Zhengyang Lu et.al.	2501.15852	link
2025-01-26	Universal Image Restoration Pre-training via Degradation Classification	JiaKui Hu et.al.	2501.15510	link
2025-01-26	Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations	Zijun Long et.al.	2501.15379	null
2025-01-24	Enhanced Confocal Laser Scanning Microscopy with Adaptive Physics Informed Deep Autoencoders	Zaheer Ahmad et.al.	2501.14709	null
2025-01-24	Bayesian Neural Networks for One-to-Many Mapping in Image Enhancement	Guoxi Huang et.al.	2501.14265	link
2025-01-24	CDI: Blind Image Restoration Fidelity Evaluation based on Consistency with Degraded Image	Xiaojun Tang et.al.	2501.14264	null
2025-01-23	Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models	Jakob Krogh Petersen et.al.	2501.14051	link
2025-01-23	INDIGO+: A Unified INN-Guided Probabilistic Diffusion Algorithm for Blind and Non-Blind Image Restoration	Di You et.al.	2501.14014	null
2025-01-23	Binary Diffusion Probabilistic Model	Vitaliy Kinakh et.al.	2501.13915	null
2025-01-23	Where Do You Go? Pedestrian Trajectory Prediction using Scene Features	Mohammad Ali Rezaei et.al.	2501.13848	null
2025-01-22	UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior	I-Hsiang Chen et.al.	2501.13134	null
2025-01-22	Deep Learning-Based Image Recovery and Pose Estimation for Resident Space Objects	Louis Aberdeen et.al.	2501.13009	null
2025-01-22	UniUIR: Considering Underwater Image Restoration as An All-in-One Learner	Xu Zhang et.al.	2501.12981	null
2025-01-22	FDG-Diff: Frequency-Domain-Guided Diffusion Framework for Compressed Hazy Image Restoration	Ruicheng Zhang et.al.	2501.12832	link
2025-01-21	Quality Enhancement of Radiographic X-ray Images by Interpretable Mapping	Hongxu Yang et.al.	2501.12245	null
2025-01-21	DLEN: Dual Branch of Transformer for Low-Light Image Enhancement in Dual Domains	Junyu Xia et.al.	2501.12235	null
2025-01-21	Proxies for Distortion and Consistency with Applications for Real-World Image Restoration	Sean Man et.al.	2501.12102	null
2025-01-20	SILO: Solving Inverse Problems with Latent Operators	Ron Raphaeli et.al.	2501.11746	null
2025-01-19	Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection	Zhipeng Yu et.al.	2501.11063	link
2025-01-19	Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation	Zhengwen Shen et.al.	2501.10958	null
2025-01-18	Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption	Jinyuan Liu et.al.	2501.10761	link
2025-01-18	A Resource-Efficient Training Framework for Remote Sensing Text–Image Retrieval	Weihang Zhang et.al.	2501.10638	null
2025-01-17	DiffStereo: High-Frequency Aware Diffusion Model for Stereo Image Restoration	Huiyun Cao et.al.	2501.10325	null
2025-01-16	FLOL: Fast Baselines for Real-World Low-Light Enhancement	Juan C. Benito et.al.	2501.09718	link
2025-01-16	Soft Knowledge Distillation with Multi-Dimensional Cross-Net Attention for Image Restoration Models Compression	Yongheng Zhang et.al.	2501.09321	null
2025-01-16	Knowledge Distillation for Image Restoration : Simultaneous Learning from Degraded and Clean Images	Yongheng Zhang et.al.	2501.09268	null
2025-01-15	Vision Foundation Models for Computed Tomography	Suraj Pai et.al.	2501.09001	link
2025-01-12	SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval	Bhavin Jawade et.al.	2501.08347	null
2025-01-14	AI Driven Water Segmentation with deep learning models for Enhanced Flood Monitoring	Sanjida Afrin Mou et.al.	2501.08266	link
2025-01-13	Depth and Image Fusion for Road Obstacle Detection Using Stereo Camera	Oleg Perezyabov et.al.	2501.07245	null
2025-01-12	Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation	Zhenyang Feng et.al.	2501.06749	null
2025-01-11	Natural Language Supervision for Low-light Image Enhancement	Jiahui Tang et.al.	2501.06546	null
2025-01-10	Underwater Image Enhancement using Generative Adversarial Networks: A Survey	Kancharagunta Kishan Babu et.al.	2501.06273	null
2025-01-09	HipyrNet: Hypernet-Guided Feature Pyramid network for mixed-exposure correction	Shaurya Singh Rathore et.al.	2501.05195	null
2025-01-09	ResPanDiff: Diffusion Model with Disentangled Modulations for Image Fusion	Shiqi Cao et.al.	2501.05091	null
2025-01-09	IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation	Qi Chen et.al.	2501.04995	link
2025-01-08	Color Correction Meets Cross-Spectral Refinement: A Distribution-Aware Diffusion for Underwater Image Restoration	Laibin Chang et.al.	2501.04740	null
2025-01-14	HyFusion: Enhanced Reception Field Transformer for Hyperspectral Image Fusion	Chia-Ming Lee et.al.	2501.04665	null
2025-01-08	FrontierNet: Learning Visual Cues to Explore	Boyang Sun et.al.	2501.04597	link
2025-01-08	MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration	Zhi Jin et.al.	2501.04486	link
2025-01-08	Recognition-Oriented Low-Light Image Enhancement based on Global and Pixelwise Optimization	Seitaro Ono et.al.	2501.04210	null
2025-01-07	Fixed Points of Deep Neural Networks: Emergence, Stability, and Applications	L. Berlyand et.al.	2501.04182	null
2025-01-07	Convergent Primal-Dual Plug-and-Play Image Restoration: A General Algorithm and Applications	Yodai Suzuki et.al.	2501.03780	link
2025-01-06	ImageMM: Joint multi-frame image restoration and super-resolution	Yashil Sukurdeep et.al.	2501.03002	null
2025-01-06	Integrating Language-Image Prior into EEG Decoding for Cross-Task Zero-Calibration RSVP-BCI	Xujin Li et.al.	2501.02841	null
2025-01-06	Underwater Image Restoration Through a Prior Guided Hybrid Sense Approach and Extensive Benchmark Analysis	Xiaojiao Guo et.al.	2501.02701	link
2025-01-03	iCBIR-Sli: Interpretable Content-Based Image Retrieval with 2D Slice Embeddings	Shuhei Tomoshige et.al.	2501.01642	null
2025-01-02	Domain-invariant feature learning in brain MR imaging for content-based image retrieval	Shuya Tobari et.al.	2501.01326	null
2025-01-03	Conditional Consistency Guided Image Translation and Enhancement	Amil Bhagat et.al.	2501.01223	link
2025-01-02	Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion	Dong Zhang et.al.	2501.01114	null
2024-12-30	Text-to-Image GAN with Pretrained Representations	Xiaozhou You et.al.	2501.00116	null
2024-12-30	Varformer: Adapting VAR’s Generative Prior for Image Restoration	Siyang Wang et.al.	2412.21063	link
2024-12-30	Low-Light Image Enhancement via Generative Perceptual Priors	Han Zhou et.al.	2412.20916	link
2024-12-29	Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)	Tomer Garber et.al.	2412.20596	link
2024-12-28	Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems	Wen-Dong Jiang et.al.	2412.20201	null
2024-12-28	UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity	Jingbo Lin et.al.	2412.20157	link
2024-12-28	MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration	Boyun Li et.al.	2412.20066	link
2024-12-28	An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models	Yuang Wang et.al.	2412.19992	null
2024-12-27	Generative Adversarial Network on Motion-Blur Image Restoration	Zhengdong Li et.al.	2412.19479	null
2024-12-25	FOR: Finetuning for Object Level Open Vocabulary Image Retrieval	Hila Levi et.al.	2412.18806	null
2024-12-24	Underwater Image Restoration via Polymorphic Large Kernel CNNs	Xiaojiao Guo et.al.	2412.18459	link
2024-12-24	UNet–: Memory-Efficient and Feature-Enhanced Network Architecture based on U-Net with Reduced Skip-Connections	Lingxiao Yin et.al.	2412.18276	null
2024-12-24	SDM-Car: A Dataset for Small and Dim Moving Vehicles Detection in Satellite Videos	Zhen Zhang et.al.	2412.18214	link
2024-12-24	ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval	Le Dong et.al.	2412.18136	link
2024-12-22	Where am I? Cross-View Geo-localization with Natural Language Descriptions	Junyan Ye et.al.	2412.17007	null
2024-12-21	Optoelectronic generative adversarial networks	Jumin Qiu et.al.	2412.16672	link
2024-12-21	Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising	Yuchen Wang et.al.	2412.16645	null
2024-12-24	Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling	Daichi Yashima et.al.	2412.16576	link
2024-12-21	Rethinking Model Redundancy for Low-light Image Enhancement	Tong Li et.al.	2412.16459	null
2024-12-20	SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild	Jannik Elsäßer et.al.	2412.16147	null
2024-12-20	NeuroPump: Simultaneous Geometric and Color Rectification for Underwater Images	Yue Guo et.al.	2412.15890	null
2024-12-20	Multi-dimensional Visual Prompt Enhanced Image Restoration via Mamba-Transformer Aggregation	Aiwen Jiang et.al.	2412.15845	link
2024-12-20	A New Method to Capturing Compositional Knowledge in Linguistic Space	Jiahe Wan et.al.	2412.15632	null
2024-12-20	Stabilizing Laplacian Inversion in Fokker-Planck Image Retrieval using the Transport-of-Intensity Equation	Samantha J Alloo et.al.	2412.15513	null
2024-12-19	Learning Visual Composition through Improved Semantic Guidance	Austin Stone et.al.	2412.15396	null
2024-12-19	Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model	Minglong Xue et.al.	2412.14630	link
2024-12-19	MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval	Junjie Zhou et.al.	2412.14475	null
2024-12-18	Personalized Generative Low-light Image Denoising and Enhancement	Xijun Wang et.al.	2412.14327	null
2024-12-18	Distilled Pooling Transformer Encoder for Efficient Realistic Image Dehazing	Le-Anh Tran et.al.	2412.14220	link
2024-12-18	Adversarial Hubness in Multi-Modal Retrieval	Tingwei Zhang et.al.	2412.14113	link
2024-12-18	Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval	Giacomo Pacini et.al.	2412.13834	null
2024-12-18	Fed-AugMix: Balancing Privacy and Utility via Data Augmentation	Haoyang Li et.al.	2412.13818	null
2024-12-18	Multi-Exposure Image Fusion via Distilled 3D LUT Grid with Editable Mode	Xin Su et.al.	2412.13749	link
2024-12-18	VIIS: Visible and Infrared Information Synthesis for Severe Low-light Image Enhancement	Chen Zhao et.al.	2412.13655	link
2024-12-18	DarkIR: Robust Low-Light Image Restoration	Daniel Feijoo et.al.	2412.13443	link
2024-12-18	Zero-Shot Low Light Image Enhancement with Diffusion Prior	Joshua Cho et.al.	2412.13401	link
2024-12-17	Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration	Xinlong Cheng et.al.	2412.12550	null
2024-12-17	Three Things to Know about Deep Metric Learning	Yash Patel et.al.	2412.12432	null
2024-12-16	Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD)	Ki-Hwan Oh et.al.	2412.12238	link
2024-12-16	Ultra-High-Definition Dynamic Multi-Exposure Image Fusion via Infinite Pixel Learning	Xingchi Chen et.al.	2412.11685	null
2024-12-16	CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution	Bingwen Hu et.al.	2412.11609	null
2024-12-15	Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval	Zelong Sun et.al.	2412.11087	null
2024-12-15	Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval	Yuanmin Tang et.al.	2412.11077	link
2024-12-15	Towards Context-aware Convolutional Network for Image Restoration	Fangwei Hao et.al.	2412.11008	null
2024-12-14	Boosting ViT-based MRI Reconstruction from the Perspectives of Frequency Modulation, Spatial Purification, and Scale Diversification	Yucong Meng et.al.	2412.10776	null
2024-12-16	Matrix Completion via Residual Spectral Matching	Ziyuan Chen et.al.	2412.10005	null
2024-12-13	$\textrm{A}^{\textrm{2}}$ RNet: Adversarial Attack Resilient Network for Robust Infrared and Visible Image Fusion	Jiawei Li et.al.	2412.09954	link
2024-12-12	OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs	Yuanzhi Zhu et.al.	2412.09465	link
2024-12-13	Are Conditional Latent Diffusion Models Effective for Image Restoration?	Yunchen Yuan et.al.	2412.09324	null
2024-12-13	MVC-VPR: Mutual Learning of Viewpoint Classification and Visual Place Recognition	Qiwen Gu et.al.	2412.09199	null
2024-12-12	ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring	Zhongbao Yang et.al.	2412.09193	null
2024-12-12	Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration	Yunshuai Zhou et.al.	2412.08939	link
2024-12-12	A Flexible Plug-and-Play Module for Generating Variable-Length	Liyang He et.al.	2412.08922	link
2024-12-11	Image Retrieval Methods in the Dissimilarity Space	Madhu Kiran et.al.	2412.08618	null
2024-12-11	Convergence Analysis of a Proximal Stochastic Denoising Regularization Algorithm	Marien Renaud et.al.	2412.08262	null
2024-12-11	Visible and Infrared Image Fusion Using Encoder-Decoder Network	Ferhat Can Ataman et.al.	2412.08073	link
2024-12-11	BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion	Huafeng Li et.al.	2412.08050	link
2024-12-10	Image Retrieval with Intra-Sweep Representation Learning for Neck Ultrasound Scanning Guidance	Wanwen Chen et.al.	2412.07741	null
2024-12-10	Leveraging Content and Context Cues for Low-Light Image Enhancement	Igor Morawski et.al.	2412.07693	link
2024-12-10	Analytical-Heuristic Modeling and Optimization for Low-Light Image Enhancement	Axel Martinez et.al.	2412.07659	null
2024-12-10	Deep Joint Unrolling for Deblurring and Low-Light Image Enhancement (JUDE).pdf	Tu Vo et.al.	2412.07527	null
2024-12-10	Modeling Dual-Exposure Quad-Bayer Patterns for Joint Denoising and Deblurring	Yuzhi Zhao et.al.	2412.07256	link
2024-12-10	EchoIR: Advancing Image Restoration with Echo Upsampling and Bi-Level Optimization	Yuhan He et.al.	2412.07225	null
2024-12-10	A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing	Yujie Feng et.al.	2412.07195	null
2024-12-09	InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention	Howard Zhang et.al.	2412.06753	null
2024-12-09	EchoSim4D: A Proof-of-Concept Gamified XR Echocardiography Training Simulator for Neonates using 4D Ultrasound Volume	Deepthy Rose Jose et.al.	2412.06271	null
2024-12-08	A Review on Multisensor Data Fusion for Wearable Health Monitoring	Arlene John et.al.	2412.05895	null
2024-12-07	Compositional Image Retrieval via Instruction-Aware Contrastive Learning	Wenliang Zhong et.al.	2412.05756	link
2024-12-07	Enhancing Sample Generation of Diffusion Models using Noise Level Correction	Abulikemu Abuduweili et.al.	2412.05488	null
2024-12-06	Equivariant Denoisers for Image Restoration	Marien Renaud et.al.	2412.05343	null
2024-12-06	ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration	Chi-Wei Hsiao et.al.	2412.05043	null
2024-12-06	DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection	Yishuo Chen et.al.	2412.04931	link
2024-12-06	DAug: Diffusion-based Channel Augmentation for Radiology Image Retrieval and Classification	Ying Jin et.al.	2412.04828	null
2024-12-06	Modality Decoupling is All You Need: A Simple Solution for Unsupervised Hyperspectral Image Fusion	Songcheng Du et.al.	2412.04802	link
2024-12-05	Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise	Brayan Monroy et.al.	2412.04648	link
2024-12-05	MetaFormer: High-fidelity Metalens Imaging via Aberration Correcting Transformers	Byeonghyeon Lee et.al.	2412.04591	null
2024-12-05	Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image	Shuang Xu et.al.	2412.04201	null
2024-12-05	Deep priors for satellite image restoration with accurate uncertainties	Biquard Maud et.al.	2412.04130	null
2024-12-05	Blind Underwater Image Restoration using Co-Operational Regressor Networks	Ozer Can Devecioglu et.al.	2412.03995	null
2024-12-05	LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model	Yuan Xue et.al.	2412.03841	null
2024-12-05	Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration	Yuzhen Du et.al.	2412.03814	null
2024-12-04	Composed Image Retrieval for Training-Free Domain Conversion	Nikos Efthymiadis et.al.	2412.03297	link
2024-12-04	Task-driven Image Fusion with Learnable Fusion Loss	Haowen Bai et.al.	2412.03240	null
2024-12-04	Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution	Jiahua Xiao et.al.	2412.02960	null
2024-12-03	Active Learning via Classifier Impact and Greedy Selection for Interactive Image Retrieval	Leah Bar et.al.	2412.02310	link
2024-12-03	Relaxed and Inertial Nonlinear Forward-Backward with Momentum	Fernando Roldán et.al.	2412.02045	link
2024-12-02	Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and Annoy with Fine-Tuned Features	MD Shaikh Rahman et.al.	2412.01555	null
2024-12-02	Phaseformer: Phase-based Attention Mechanism for Underwater Image Restoration and Beyond	MD Raqib Khan et.al.	2412.01456	link
2024-12-02	FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration	Hao Li et.al.	2412.01427	null
2024-12-02	Neuron Abandoning Attention Flow: Visual Explanation of Dynamics inside CNN Models	Yi Liao et.al.	2412.01202	null
2024-12-01	Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration	Haoze Sun et.al.	2412.00878	null
2024-12-01	DMFourLLIE: Dual-Stage and Multi-Branch Fourier Network for Low-Light Image Enhancement	Tongshun Zhang et.al.	2412.00683	link
2024-12-01	MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning	You Wu et.al.	2412.00626	link
2024-11-30	Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion	Michail Dontas et.al.	2412.00557	null
2024-11-29	Self-Supervised Denoiser Framework	Emilien Valat et.al.	2411.19593	null
2024-11-27	Optimizing Image Retrieval with an Extended b-Metric Space	Abdelkader Belhenniche et.al.	2411.18800	null
2024-11-27	Hierarchical Information Flow for Generalized Efficient Image Restoration	Yawei Li et.al.	2411.18588	null
2024-11-27	Complexity Experts are Task-Discriminative Learners for Any Image Restoration	Eduard Zamfir et.al.	2411.18466	null
2024-11-27	Adaptive Blind All-in-One Image Restoration	David Serrano-Lozano et.al.	2411.18412	link
2024-11-29	HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning	Zengxi Zhang et.al.	2411.18296	link
2024-11-27	TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution	Linwei Dong et.al.	2411.18263	link
2024-12-02	Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision	Jinnyeong Kim et.al.	2411.18025	null
2024-11-26	Low-rank Adaptation-based All-Weather Removal for Autonomous Navigation	Sudarshan Rajagopalan et.al.	2411.17814	null
2024-11-26	GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration	Sudarshan Rajagopalan et.al.	2411.17687	null
2024-11-26	Learning Visual Hierarchies with Hyperbolic Embeddings	Ziwei Wang et.al.	2411.17490	null
2024-11-26	Puzzle Similarity: A Perceptually-guided No-Reference Metric for Artifact Detection in 3D Scene Reconstructions	Nicolai Hermann et.al.	2411.17489	null
2024-11-26	MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers	Ruoxi Zhu et.al.	2411.17226	link
2024-11-25	Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding	Yubin Gu et.al.	2411.16217	null
2024-11-25	U2NeRF: Unsupervised Underwater Image Restoration and Neural Radiance Fields	Vinayak Gupta et.al.	2411.16172	null
2024-11-25	Image Generation Diversity Issues and How to Tame Them	Mischa Dombrowski et.al.	2411.16171	link
2024-11-24	PromptHSI: Universal Hyperspectral Image Restoration Framework for Composite Degradation	Chia-Ming Lee et.al.	2411.15922	link
2024-11-24	MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking	Chunhui Zhang et.al.	2411.15761	link
2024-11-24	LTCF-Net: A Transformer-Enhanced Dual-Channel Fourier Framework for Low-Light Image Restoration	Gaojing Zhang et.al.	2411.15740	null
2024-11-22	Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration	Darshan Thaker et.al.	2411.15295	null
2024-11-22	MambaIRv2: Attentive State Space Restoration	Hang Guo et.al.	2411.15269	link
2024-11-22	Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval	Zengbao Sun et.al.	2411.14704	link
2024-11-21	Unveiling the Hidden: A Comprehensive Evaluation of Underwater Image Enhancement and Its Impact on Object Detection	Ali Awad et.al.	2411.14626	link
2024-11-21	Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion	Jinhong He et.al.	2411.13961	link
2024-11-20	Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play Algorithms	Matthieu Kowalski et.al.	2411.13276	null
2024-11-20	Globally Correlation-Aware Hard Negative Generation	Wenjie Peng et.al.	2411.13145	link
2024-11-19	Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution	Yang Zou et.al.	2411.12530	link
2024-11-19	Frequency-Aware Guidance for Blind Image Restoration via Diffusion Models	Jun Xiao et.al.	2411.12450	null
2024-11-19	Versatile Cataract Fundus Image Restoration Model Utilizing Unpaired Cataract and High-quality Images	Zheng Gong et.al.	2411.12278	null
2024-11-16	GeoGround: A Unified Large Vision-Language Model. for Remote Sensing Visual Grounding	Yue Zhou et.al.	2411.11904	link
2024-11-18	Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion	Meng Zhou et.al.	2411.11799	link
2024-11-18	Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment	Zhendong Liu et.al.	2411.11543	null
2024-11-17	Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method	Yan Zheng et.al.	2411.11135	null
2024-11-19	TSFormer: A Robust Framework for Efficient UHD Image Restoration	Xin Su et.al.	2411.10951	null
2024-11-16	AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations	Jiawei Mao et.al.	2411.10708	null
2024-11-16	Underwater Image Enhancement with Cascaded Contrastive Learning	Yi Liu et.al.	2411.10682	link
2024-11-16	SPDFusion: An Infrared and Visible Image Fusion Network Based on a Non-Euclidean Representation of Riemannian Manifolds	Huan Kang et.al.	2411.10679	null
2024-11-15	Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence	Guodong Sun et.al.	2411.10321	null
2024-11-15	Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting	Ziqi Xie et.al.	2411.10309	link
2024-11-15	Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion	Dan He et.al.	2411.10036	null
2024-11-14	Instruction-Driven Fusion of Infrared-Visible Images: Tailoring for Diverse Downstream Tasks	Zengyi Yang et.al.	2411.09387	null
2024-11-13	Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval	Saul Santos et.al.	2411.08590	link
2024-11-13	Saliency Map-based Image Retrieval using Invariant Krawtchouk Moments	Ashkan Nejad et.al.	2411.08567	link
2024-11-12	CT-Mamba: A Hybrid Convolutional State Space Model for Low-Dose CT Denoising	Linxuan Li et.al.	2411.07930	link
2024-11-12	Joint multi-dimensional dynamic attention and transformer for general image restoration	Huan Zhang et.al.	2411.07893	link
2024-11-12	All-in-one Weather-degraded Image Restoration via Adaptive Degradation-aware Self-prompting Model	Yuanbo Wen et.al.	2411.07445	null
2024-11-11	Multi-scale Frequency Enhancement Network for Blind Image Deblurring	Yawen Xiang et.al.	2411.06893	null
2024-11-10	Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration	Chen Wu et.al.	2411.06456	null
2024-11-08	A Modular Conditional Diffusion Framework for Image Reconstruction	Magauiya Zhussip et.al.	2411.05993	null
2024-11-05	From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing	Xintian Sun et.al.	2411.05826	null
2024-11-07	Dynamic Brightness Adaptation for Robust Multi-modal Image Fusion	Yiming Sun et.al.	2411.04697	link
2024-11-07	l0-Regularized Sparse Coding-based Interpretable Network for Multi-Modal Image Fusion	Gargi Panda et.al.	2411.04519	null
2024-11-05	Test-Time Dynamic Image Fusion	Bing Cao et.al.	2411.02840	link
2024-11-05	ERUP-YOLO: Enhancing Object Detection Robustness for Adverse Weather Condition by Unified Image-Adaptive Processing	Yuka Ogino et.al.	2411.02799	null
2024-11-04	TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives	Maitreya Patel et.al.	2411.02545	null
2024-11-11	INQUIRE: A Natural World Text-to-Image Retrieval Benchmark	Edward Vendrow et.al.	2411.02537	link
2024-11-04	Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models	Sharat Agarwal et.al.	2411.01925	null
2024-11-03	Degradation-Aware Residual-Conditioned Optimal Transport for Unified Image Restoration	Xiaole Tang et.al.	2411.01656	link
2024-11-03	Conditional Controllable Image Fusion	Bing Cao et.al.	2411.01573	link
2024-11-03	Efficient Medical Image Retrieval Using DenseNet and FAISS for BIRADS Classification	MD Shaikh Rahman et.al.	2411.01473	null
2024-11-03	TPOT: Topology Preserving Optimal Transport in Retinal Fundus Image Enhancement	Xuanzhao Dong et.al.	2411.01403	link
2024-11-02	Medical X-Ray Image Enhancement Using Global Contrast-Limited Adaptive Histogram Equalization	Sohrab Namazi Nia et.al.	2411.01373	null
2024-11-01	Identifying Implicit Social Biases in Vision-Language Models	Kimia Hamidieh et.al.	2411.00997	null
2024-10-31	Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes	Shaohua Liu et.al.	2411.00239	null
2024-10-31	Chasing Better Deep Image Priors between Over- and Under-parameterization	Qiming Wu et.al.	2410.24187	link
2024-10-31	Nearest Neighbor Normalization Improves Multimodal Retrieval	Neil Chowdhury et.al.	2410.24114	link
2024-10-31	Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation	Yihang Zhou et.al.	2410.23962	null
2024-10-31	Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model	Hao Zhang et.al.	2410.23905	link
2024-10-31	MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval	Haiwen Li et.al.	2410.23736	null
2024-10-31	Cycle-Constrained Adversarial Denoising Convolutional Network for PET Image Denoising: Multi-Dimensional Validation on Large Datasets with Reader Study and Real Low-Dose Data	Yucun Hou et.al.	2410.23628	null
2024-10-31	MS-Glance: Non-semantic context vectors and the applications in supervising image reconstruction	Ziqi Gao et.al.	2410.23577	link
2024-10-30	Decoupling Semantic Similarity from Spatial Alignment for Neural Networks	Tassilo Wald et.al.	2410.23107	link
2024-10-30	EnsIR: An Ensemble Algorithm for Image Restoration via Gaussian Mixture Models	Shangquan Sun et.al.	2410.22959	link
2024-10-30	SFDFusion: An Efficient Spatial-Frequency Domain Fusion Network for Infrared and Visible Image Fusion	Kun Hu et.al.	2410.22837	link
2024-10-30	Analyzing Noise Models and Advanced Filtering Algorithms for Image Enhancement	Sahil Ali Akbar et.al.	2410.21946	link
2024-10-29	Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications	Monica Riedler et.al.	2410.21943	link
2024-10-28	Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework	Vladimir Arkhipkin et.al.	2410.21061	link
2024-10-27	Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement	Junhao Tan et.al.	2410.20314	link
2024-10-27	Deep Learning, Machine Learning – Digital Signal and Image Processing: From Theory to Application	Weiche Hsieh et.al.	2410.20304	null
2024-10-24	HUE Dataset: High-Resolution Event and Frame Sequences for Low-Light Vision	Burak Ercan et.al.	2410.19164	null
2024-10-24	ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval	Zijia Zhao et.al.	2410.18715	link
2024-10-29	DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation	Yuang Ai et.al.	2410.18666	link
2024-10-23	DREB-Net: Dual-stream Restoration Embedding Blur-feature Fusion Network for High-mobility UAV Object Detection	Qingpeng Li et.al.	2410.17822	link
2024-10-23	An Intelligent Agentic System for Complex Image Restoration Problems	Kaiwen Zhu et.al.	2410.17809	link
2024-10-23	A variational approach to nonlocal image restoration flows	Harsh Prasad et.al.	2410.17649	null
2024-10-23	Diffusion Priors for Variational Likelihood Estimation and Image Denoising	Jun Cheng et.al.	2410.17521	link
2024-10-22	Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval	Yuanmin Tang et.al.	2410.17393	null
2024-10-20	LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration	Yuang Ai et.al.	2410.15385	link
2024-10-20	GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning	Haiwen Diao et.al.	2410.15266	link
2024-10-19	A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends	Junjun Jiang et.al.	2410.15067	link
2024-10-19	Visual Navigation of Digital Libraries: Retrieval and Classification of Images in the National Library of Norway’s Digitised Book Collection	Marie Roald et.al.	2410.14969	link
2024-10-16	Development of Image Collection Method Using YOLO and Siamese Network	Chan Young Shin et.al.	2410.12561	null
2024-10-16	Towards Flexible and Efficient Diffusion Low Light Enhancer	Guanzhou Lan et.al.	2410.12346	null
2024-10-16	Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond	Pengwei Liang et.al.	2410.12274	null
2024-10-15	Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos	Zhouxia Wang et.al.	2410.11828	null
2024-10-15	LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images	Yuzhou Cheng et.al.	2410.11505	null
2024-10-13	Fusion Based Hand Geometry Recognition Using Dempster-Shafer Theory	Asish Bera et.al.	2410.09842	null
2024-10-13	LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond	Md Tanvir Islam et.al.	2410.09831	link
2024-10-14	LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection	Mingjia Li et.al.	2410.08810	link
2024-10-11	Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers	Jin Cao et.al.	2410.08688	link
2024-10-16	Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP	Eunji Kim et.al.	2410.08469	null
2024-10-11	A Unified Deep Semantic Expansion Framework for Domain-Generalized Person Re-identification	Eugene P. W. Ang et.al.	2410.08456	null
2024-10-10	TANet: Triplet Attention Network for All-In-One Adverse Weather Image Restoration	Hsing-Hua Wang et.al.	2410.08177	link
2024-10-10	A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks	Hoin Jung et.al.	2410.07593	link
2024-10-09	Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval	Mohammad Omama et.al.	2410.07022	null
2024-10-09	Rethinking the Evaluation of Visible and Infrared Image Fusion	Dayan Guan et.al.	2410.06811	link
2024-10-09	InstantIR: Blind Image Restoration with Instant Generative Reference	Jen-Yuan Huang et.al.	2410.06551	null
2024-10-09	MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging	Noel C. F. Codella et.al.	2410.06542	null
2024-10-08	Temporal Image Caption Retrieval Competition – Description and Results	Jakub Pokrywka et.al.	2410.06314	null
2024-10-08	GSLoc: Visual Localization with 3D Gaussian Splatting	Kazii Botashev et.al.	2410.06165	null
2024-10-08	Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning	Ayush Singh et.al.	2410.05928	null
2024-10-08	ReFIR: Grounding Large Restoration Models with Retrieval Augmentation	Hang Guo et.al.	2410.05601	link
2024-10-09	LoTLIP: Improving Language-Image Pre-training for Long Text Understanding	Wei Wu et.al.	2410.05249	null
2024-10-07	Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration	Zhiyu Zhu et.al.	2410.04811	link
2024-10-06	Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli	Valentyn Piskovskyi et.al.	2410.04497	null
2024-10-06	SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems	Ismail Alkhouri et.al.	2410.04479	link
2024-10-05	Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model	Keda Tao et.al.	2410.04161	null
2024-10-04	Diffusion State-Guided Projected Gradient for Inverse Problems	Rayhan Zirvi et.al.	2410.03463	link
2024-10-03	PnP-Flow: Plug-and-Play Image Restoration with Flow Matching	Ségolène Martin et.al.	2410.02423	link
2024-10-03	Can Capacitive Touch Images Enhance Mobile Keyboard Decoding?	Piyawat Lertvittayakumjorn et.al.	2410.02264	link
2024-10-02	Posterior sampling via Langevin dynamics based on generative priors	Vishal Purohit et.al.	2410.02078	null
2024-10-03	EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections	Francesc Net et.al.	2410.01536	link
2024-10-04	CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment	Safouane El Ghazouali et.al.	2410.01411	link
2024-10-01	Three-Operator Splitting Method with Two-Step Inertial Extrapolation	Olaniyi S. Iyiola et.al.	2410.01099	null
2024-10-01	GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer	Youngho Yoon et.al.	2410.00672	link
2024-10-01	Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration	Guy Ohayon et.al.	2410.00418	link
2024-10-01	GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction	Zaid Ilyas et.al.	2410.00380	null
2024-09-30	Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation	Aleyna Kütük et.al.	2410.00266	null
2024-09-30	A Survey on Diffusion Models for Inverse Problems	Giannis Daras et.al.	2410.00083	null
2024-09-30	UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation	Cheng Zhang et.al.	2409.20197	link
2024-09-29	Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation	Xiaofeng Cong et.al.	2409.19685	link
2024-09-28	Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration	Chu-Jie Qin et.al.	2409.19403	link
2024-09-28	VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition	Ahmad Khaliq et.al.	2409.19293	link
2024-09-28	PDCFNet: Enhancing Underwater Images through Pixel Difference Convolution	Song Zhang et.al.	2409.19269	link
2024-09-28	Extending Depth of Field for Varifocal Multiview Images	Zhilong Li et.al.	2409.19220	null
2024-09-27	MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion	Bardienus Duisterhof et.al.	2409.19152	null
2024-09-27	Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors	Yunlong Lin et.al.	2409.18899	null
2024-09-26	Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval	Mankeerat Sidhu et.al.	2409.18733	null
2024-09-27	Multi-modal Medical Image Fusion For Non-Small Cell Lung Cancer Classification	Salma Hassan et.al.	2409.18715	null
2024-09-27	Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models	Nguyen Gia Bach et.al.	2409.18476	link
2024-09-27	SinoSynth: A Physics-based Domain Randomization Approach for Generalizable CBCT Image Enhancement	Yunkui Pang et.al.	2409.18355	link
2024-09-26	Toward Efficient Deep Blind RAW Image Restoration	Marcos V. Conde et.al.	2409.18204	link
2024-09-26	Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs	Qinpeng Cui et.al.	2409.17778	link
2024-09-25	Morphological-consistent Diffusion Network for Ultrasound Coronal Image Enhancement	Yihao Zhou et.al.	2409.16661	null
2024-09-25	Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement	Guanlin Li et.al.	2409.16604	link
2024-09-24	Proactive Schemes: A Survey of Adversarial Attacks for Social Good	Vishal Asnani et.al.	2409.16491	null
2024-09-24	Liger at W.M. Keck Observatory: imager structural analysis, fabrication, and characterization plan	James Wiley et.al.	2409.16263	null
2024-09-23	PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions	Weifeng Lin et.al.	2409.15278	link
2024-09-23	FusionRF: High-Fidelity Satellite Neural Radiance Fields from Multispectral and Panchromatic Acquisitions	Michael Sprintson et.al.	2409.15132	null
2024-09-22	Low-Light Enhancement Effect on Classification and Detection: An Empirical Study	Xu Wu et.al.	2409.14461	null
2024-09-22	Quantitative and Qualitative Evaluation of NLM and Wavelet Methods in Image Enhancement	Cameron Khanpour et.al.	2409.14334	null
2024-09-20	Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval	Morris Florek et.al.	2409.13513	link
2024-09-19	Deep Learning-Based Detection of Referable Diabetic Retinopathy and Macular Edema Using Ultra-Widefield Fundus Imaging	Philippe Zhang et.al.	2409.12854	null
2024-09-19	Fundus image enhancement through direct diffusion bridges	Sehui Kim et.al.	2409.12377	link
2024-09-18	Denoising diffusion models for high-resolution microscopy image restoration	Pamela Osuna-Vargas et.al.	2409.12078	null
2024-09-18	DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion	Jian Xu et.al.	2409.11642	link
2024-09-17	Ultrasound Image Enhancement with the Variance of Diffusion Models	Yuxin Zhang et.al.	2409.11380	link
2024-09-17	Improving the Efficiency of Visually Augmented Language Models	Paula Ontalvilla et.al.	2409.11148	link
2024-09-17	CUNSB-RFIE: Context-aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement	Xuanzhao Dong et.al.	2409.10966	link
2024-09-16	Taming Diffusion Models for Image Restoration: A Review	Ziwei Luo et.al.	2409.10353	null
2024-09-17	Fuse4Seg: Image-Level Fusion Based Multi-Modality Medical Image Segmentation	Yuchen Guo et.al.	2409.10328	null
2024-09-16	Garment Attribute Manipulation with Multi-level Attention	Vittorio Casula et.al.	2409.10206	null
2024-09-16	DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion	Yuchen Guo et.al.	2409.10080	null
2024-09-15	Underwater Image Enhancement via Dehazing and Color Restoration	Chengqin Wu et.al.	2409.09779	null
2024-09-15	Unsupervised Hyperspectral and Multispectral Image Blind Fusion Based on Deep Tucker Decomposition Network with Spatial-Spectral Manifold Learning	He Wang et.al.	2409.09670	link
2024-09-14	Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval	Amirreza Mahbod et.al.	2409.09430	link
2024-09-14	Infrared and Visible Image Fusion with Hierarchical Human Perception	Guang Yang et.al.	2409.09291	null
2024-09-12	Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement	Vamsi Krishna Vasa et.al.	2409.07862	null
2024-09-12	Quaternion Nuclear Norm minus Frobenius Norm Minimization for color image reconstruction	Yu Guo et.al.	2409.07797	null
2024-09-11	FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process	Yang Luo et.al.	2409.07451	null
2024-09-11	Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement	Xianmin Chen et.al.	2409.07040	link
2024-09-11	PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening	RuoCheng Wu et.al.	2409.06980	null
2024-09-10	Modeling Image Tone Dichotomy with the Power Function	Axel Martinez et.al.	2409.06764	null
2024-09-10	Lightweight Multiscale Feature Fusion Super-Resolution Network Based on Two-branch Convolution and Transformer	Li Ke et.al.	2409.06590	null
2024-09-10	Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models	Siyu Zhai et.al.	2409.06420	null
2024-09-10	A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions	Zhicong Wu et.al.	2409.06381	null
2024-09-10	Multi-Weather Image Restoration via Histogram-Based Transformer Feature Enhancement	Yang Wen et.al.	2409.06334	null
2024-09-10	AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration	Hongyi Cai et.al.	2409.06206	null
2024-09-09	Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding	Bram Willemsen et.al.	2409.05721	link
2024-09-09	Open-World Dynamic Prompt and Continual Visual Representation Learning	Youngeun Kim et.al.	2409.05312	null
2024-09-09	Rethinking the Atmospheric Scattering-driven Attention via Channel and Gamma Correction Priors for Low-Light Image Enhancement	Shyang-En Weng et.al.	2409.05274	link
2024-09-07	Training-free ZS-CIR via Weighted Modality Fusion and Similarity	Ren-Di Wu et.al.	2409.04918	link
2024-09-07	Power Line Aerial Image Restoration under dverse Weather: Datasets and Baselines	Sai Yang et.al.	2409.04812	link
2024-09-06	Zero-Shot Whole Slide Image Retrieval in Histopathology Using Embeddings of Foundation Models	Saghir Alfasly et.al.	2409.04631	null
2024-09-06	Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior	Charlesquin Kemajou Mbakam et.al.	2409.04384	null
2024-09-06	RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement	Hao Luo et.al.	2409.04363	link
2024-09-06	Secure Traffic Sign Recognition: An Attention-Enabled Universal Image Inpainting Mechanism against Light Patch Attacks	Hangcheng Cao et.al.	2409.04133	null
2024-09-05	Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration	Pei Wang et.al.	2409.03455	null
2024-09-05	KAN See In the Dark	Aoxiang Ning et.al.	2409.03404	link
2024-09-05	Multiple weather images restoration using the task transformer and adaptive mixup strategy	Yang Wen et.al.	2409.03249	null
2024-09-05	Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion	Chenguang Zhu et.al.	2409.03223	null
2024-09-05	Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem	Qiwen Zhu et.al.	2409.03179	link
2024-09-04	Design and Evaluation of Camera-Centric Mobile Crowdsourcing Applications	Abby Stylianou et.al.	2409.03012	null
2024-09-04	Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening	Ivan Pereira-Sánchez et.al.	2409.02675	link
2024-09-04	NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval	Sepanta Zeighami et.al.	2409.02343	link
2024-09-03	Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models	Jiaqi Xu et.al.	2409.02101	link
2024-09-03	F2former: When Fractional Fourier Meets Deep Wiener Deconvolution and Selective Frequency Transformer for Image Deblurring	Subhajit Paul et.al.	2409.02056	null
2024-09-03	AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions	Chenghao Qian et.al.	2409.02045	link
2024-09-03	Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment	Konstantin Schall et.al.	2409.01936	link
2024-09-03	Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion	Ke Cao et.al.	2409.01728	null
2024-09-03	Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement	Kun Zhou et.al.	2409.01641	link
2024-09-03	GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting	Zixuan Guo et.al.	2409.01581	null
2024-09-02	A Review of Image Retrieval Techniques: Data Augmentation and Adversarial Learning Approaches	Kim Jinwoo et.al.	2409.01219	null
2024-08-30	Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method	Yuji Lin et.al.	2408.17339	link
2024-09-02	RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance	Avideep Mukherjee et.al.	2408.17095	null
2024-08-30	Efficient Image Restoration through Low-Rank Adaptation and Stable Diffusion XL	Haiyang Zhao et.al.	2408.17060	null
2024-08-29	GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content	Lebin Zhou et.al.	2408.16866	null
2024-09-02	A Deep-Learning-Based Label-free No-Reference Image Quality Assessment Metric: Application in Sodium MRI Denoising	Shuaiyu Yuan et.al.	2408.16481	null
2024-08-29	Enhanced Control for Diffusion Bridge in Image Restoration	Conghan Yue et.al.	2408.16303	link
2024-08-29	Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models	Kengo Nakata et.al.	2408.16296	null
2024-08-29	LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement	Ye Yu et.al.	2408.16235	link
2024-08-28	Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration	Xu Zhang et.al.	2408.15994	null
2024-08-28	MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion	Yanglin Deng et.al.	2408.15641	link
2024-08-28	Temporal Attention for Cross-View Sequential Image Localization	Dong Yuan et.al.	2408.15569	link
2024-08-27	A Preliminary Exploration Towards General Image Restoration	Xiangtao Kong et.al.	2408.15143	null
2024-08-27	Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild	Tianqi Wei et.al.	2408.14723	null
2024-08-26	FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation	Daixun Li et.al.	2408.13980	null
2024-08-25	LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task	Ali Asgarov et.al.	2408.13909	link
2024-08-23	O-Mamba: O-shape State-Space Model for Underwater Image Enhancement	Chenyu Dong et.al.	2408.12816	link
2024-08-22	CODE: Confident Ordinary Differential Editing	Bastien van Delft et.al.	2408.12418	link
2024-08-22	Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement	Lingyu Zhu et.al.	2408.12316	link
2024-08-21	Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations	Lintong Zhang et.al.	2408.11966	null
2024-08-21	OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal	Qiao Mo et.al.	2408.11480	link
2024-08-21	UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation	Xiangyu Zhao et.al.	2408.11305	link
2024-08-21	Taming Generative Diffusion for Universal Blind Image Restoration	Siwei Tu et.al.	2408.11287	null
2024-08-20	Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement	Satoshi Kosugi et.al.	2408.11055	link
2024-08-20	SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement	Linlin Hu et.al.	2408.10934	null
2024-08-20	UIE-UnFold: Deep Unfolding Network with Color Priors and Vision Transformer for Underwater Image Enhancement	Yingtie Lei et.al.	2408.10653	link
2024-08-19	BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval	Zhenyu Lu et.al.	2408.10383	null
2024-08-19	Multi-Scale Representation Learning for Image Restoration with State-Space Model	Yuhong He et.al.	2408.10145	null
2024-08-19	Harnessing Multi-resolution and Multi-scale Attention for Underwater Image Restoration	Alik Pramanick et.al.	2408.09912	link
2024-08-19	Fashion Image-to-Image Translation for Complementary Item Retrieval	Matteo Attimonelli et.al.	2408.09847	link
2024-08-19	ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement	Eashan Adhikarla et.al.	2408.09650	link
2024-08-17	Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration	Xin Lin et.al.	2408.09241	link
2024-08-16	DFT-Based Adversarial Attack Detection in MRI Brain Imaging: Enhancing Diagnostic Accuracy in Alzheimer’s Case Studies	Mohammad Hossein Najafi et.al.	2408.08489	null
2024-08-15	Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks	Jiawei Wu et.al.	2408.08149	link
2024-08-15	HAIR: Hypernetworks-based All-in-One Image Restoration	Jin Cao et.al.	2408.08091	link
2024-08-15	DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions	Ryosuke Korekata et.al.	2408.07910	null
2024-08-13	Review Learning: Advancing All-in-One Ultra-High-Definition Image Restoration Training Method	Xin Su et.al.	2408.06709	null
2024-08-12	Wavelet based inpainting detection	Barglazan Adrian-Alin et.al.	2408.06429	null
2024-08-12	Latent Disentanglement for Low Light Image Enhancement	Zhihao Zheng et.al.	2408.06245	null
2024-08-10	Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network	Junyan Ye et.al.	2408.05475	link
2024-08-10	Greedy randomized block Kaczmarz method for matrix equation AXB=C and its applications in color image restoration	Wenli Wang et.al.	2408.05444	null
2024-08-08	Physical prior guided cooperative learning framework for joint turbulence degradation estimation and infrared video restoration	Ziran Zhang et.al.	2408.04227	null
2024-08-08	MultiColor: Image Colorization by Learning from Multiple Color Spaces	Xiangcheng Du et.al.	2408.04172	null
2024-08-06	AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval	Pavel Suma et.al.	2408.03282	link
2024-08-05	Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models	Tongtong Feng et.al.	2408.02408	null
2024-08-02	On Validation of Search & Retrieval of Tissue Images in Digital Pathology	H. R. Tizhoosh et.al.	2408.01570	null
2024-08-02	Underwater Object Detection Enhancement via Channel Stabilization	Muhammad Ali et.al.	2408.01293	link
2024-08-02	Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement	Wenbin Zou et.al.	2408.01276	link
2024-08-02	Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration	Donwon Park et.al.	2408.01099	null
2024-08-02	FCDFusion: a Fast, Low Color Deviation Method for Fusing Visible and Infrared Image Pairs	Hesong Li et.al.	2408.01080	null
2024-08-01	A Prior Embedding-Driven Architecture for Long Distance Blind Iris Recognition	Qi Xiong et.al.	2408.00210	null
2024-07-30	UniProcessor: A Text-induced Unified Low-level Image Processor	Huiyu Duan et.al.	2407.20928	link
2024-07-27	Inverse Problems with Diffusion Models: A MAP Estimation Perspective	Sai bharath chandra Gutha et.al.	2407.20784	link
2024-07-29	ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement	Ezequiel Perez-Zarate et.al.	2407.19708	link
2024-07-31	Mamba-UIE: Enhancing Underwater Images with Physical Model Constraint	Song Zhang et.al.	2407.19248	null
2024-07-27	Multi-Expert Adaptive Selection: Task-Balancing for All-in-One Image Restoration	Xiaoyan Yu et.al.	2407.19139	link
2024-07-26	Dilated Strip Attention Network for Image Restoration	Fangwei Hao et.al.	2407.18613	null
2024-07-25	RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models	Haoyu Chen et.al.	2407.18035	null
2024-07-25	Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography	Kailai Zhou et.al.	2407.17996	link
2024-07-23	S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks	Neha A S et.al.	2407.17587	null
2024-07-24	Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation	Yongqi Li et.al.	2407.17274	null
2024-07-23	CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction	Liang Zhao et.al.	2407.16204	null
2024-07-23	Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems	Sojin Lee et.al.	2407.16125	link
2024-07-20	Deep Learning CT Image Restoration using System Blur and Noise Models	Yijie Yuan et.al.	2407.14983	null
2024-07-23	AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement	Yunlong Lin et.al.	2407.14900	null
2024-07-20	Dual High-Order Total Variation Model for Underwater Image Restoration	Yuemei Li et.al.	2407.14868	link
2024-07-19	Adaptive Frequency Enhancement Network for Single Image Deraining	Fei Yan et.al.	2407.14292	null
2024-07-19	Double-Shot 3D Shape Measurement with a Dual-Branch Network	Mingyang Lei et.al.	2407.14198	null
2024-07-19	TaGAT: Topology-Aware Graph Attention Network For Multi-modal Retinal Image Fusion	Xin Tian et.al.	2407.14188	link
2024-07-18	Visual Haystacks: Answering Harder Questions About Sets of Images	Tsung-Han Wu et.al.	2407.13766	link
2024-07-18	Any Image Restoration with Efficient Automatic Degradation Adaptation	Bin Ren et.al.	2407.13372	link
2024-07-18	Training-Free Large Model Priors for Multiple-in-One Image Restoration	Xuanhua He et.al.	2407.13181	null
2024-07-18	Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement	Eashan Adhikarla et.al.	2407.13170	null
2024-07-21	HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration	Shuchang Zhang et.al.	2407.13120	link
2024-07-17	Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations	Tomáš Chobola et.al.	2407.12511	link
2024-07-17	GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval	Han Zhou et.al.	2407.12431	link
2024-07-17	Towards Revisiting Visual Place Recognition for Joining Submaps in Multimap SLAM	Markus Weißflog et.al.	2407.12408	null
2024-07-17	GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity	Shuo Cao et.al.	2407.12273	null
2024-07-16	Haze-Aware Attention Network for Single-Image Dehazing	Lihan Tong et.al.	2407.11505	null
2024-07-16	EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis	Ruijie Yang et.al.	2407.11401	null
2024-07-15	No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations	Walter Simoncini et.al.	2407.10964	link
2024-07-15	In-Loop Filtering via Trained Look-Up Tables	Zhuoyuan Li et.al.	2407.10926	null
2024-07-15	MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration	Yulin Ren et.al.	2407.10833	null
2024-07-15	DINO Pre-training for Vision-based End-to-end Autonomous Driving	Shubham Juneja et.al.	2407.10803	null
2024-07-15	Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval	Youngsun Lim et.al.	2407.10683	null
2024-07-15	An experimental evaluation of Siamese Neural Networks for robot localization using omnidirectional imaging in indoor environments	J. J. Cabrera et.al.	2407.10536	null

Image Matching

Publish Date	Title	Authors	PDF	Code
2025-06-25	Fast entropy-regularized SDP relaxations for permutation synchronization	Michael Lindsey et.al.	2506.20191	null
2025-06-18	ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections	Ziling Huang et.al.	2506.15180	null
2025-06-16	EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition	Bingxi Liu et.al.	2506.13133	null
2025-06-12	RealKeyMorph: Keypoints in Real-world Coordinates for Resolution-agnostic Image Registration	Mina C. Moghadam et.al.	2506.10344	null
2025-06-11	Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints	Xiangkai Zhang et.al.	2506.09748	null
2025-06-11	ScaleLSD: Scalable Deep Line Segment Detection Streamlined	Zeran Ke et.al.	2506.09369	link
2025-05-21	Anti-interrupted sampling repeater jamming via linear canonical Wigner distribution lightweight LFM detection	Jia-Mian Li et.al.	2506.06302	null
2025-06-05	Vanishing arcs for isolated plane curve singularities	Hanwool Bae et.al.	2506.04917	null
2025-06-05	Deep Learning Reforms Image Matching: A Survey and Outlook	Shihua Zhang et.al.	2506.04619	null
2025-06-20	SR3D: Unleashing Single-view 3D Reconstruction for Transparent and Specular Object Grasping	Mingxu Zhang et.al.	2505.24305	null
2025-06-05	Universal Domain Adaptation for Semantic Segmentation	Seun-An Choe et.al.	2505.22458	null
2025-05-23	To Glue or Not to Glue? Classical vs Learned Image Matching for Mobile Mapping Cameras to Textured Semantic 3D Building Models	Simone Gaisbauer et.al.	2505.17973	link
2025-05-16	Multi-view dense image matching with similarity learning and geometry priors	Mohamed Ali Chebbi et.al.	2505.11264	null
2025-05-12	Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly Detection	Yuqi Cheng et.al.	2505.07375	link
2025-05-04	OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery	Chongsheng Zhang et.al.	2505.03836	link
2025-05-06	LiftFeat: 3D Geometry-Aware Local Feature Matching	Yepeng Liu et.al.	2505.03422	link
2025-05-04	Focus What Matters: Matchability-Based Reweighting for Local Feature Matching	Dongyue Li et.al.	2505.02161	null
2025-05-15	Mitigating Modality Bias in Multi-modal Entity Alignment from a Causal Perspective	Taoyu Su et.al.	2504.19458	link
2025-04-28	Dynamic Arthroscopic Navigation System for Anterior Cruciate Ligament Reconstruction Based on Multi-level Memory Architecture	Shuo Wang et.al.	2504.19398	null
2025-04-23	Road Similarity-Based BEV-Satellite Image Matching for UGV Localization	Zhenping Sun et.al.	2504.16346	null
2025-04-18	Outlier-Robust Multi-Model Fitting on Quantum Annealers	Saurabh Pandey et.al.	2504.13836	null
2025-04-11	Geometric Consistency Refinement for Single Image Novel View Synthesis via Test-Time Adaptation of Diffusion Models	Josef Bengtson et.al.	2504.08348	null
2025-04-10	Image registration of 2D optical thin sections in a 3D porous medium: Application to a Berea sandstone digital rock image	Jaehong Chung et.al.	2504.06604	link
2025-04-22	To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition	Davide Sferrazza et.al.	2504.06116	link
2025-04-10	Learning Affine Correspondences by Integrating Geometric Constraints	Pengju Sun et.al.	2504.04834	link
2025-04-01	Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data	Yiqun Duan et.al.	2504.00812	null
2025-03-31	CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching	Zizhuo Li et.al.	2503.23925	null
2025-03-28	Pairwise Matching of Intermediate Representations for Fine-grained Explainability	Lauren Shrack et.al.	2503.22881	link
2025-03-26	Multimodal Image Matching based on Frequency-domain Information of Local Energy Response	Meng Yang et.al.	2503.20827	null
2025-03-22	Normalized Matching Transformer	Abtin Pourhadi et.al.	2503.17715	link
2025-03-20	Loop Closure from Two Views: Revisiting PGO for Scalable Trajectory Estimation through Monocular Priors	Tian Yi Lim et.al.	2503.16275	null
2025-03-20	MapGlue: Multimodal Remote Sensing Image Matching	Peihao Wu et.al.	2503.16185	link
2025-03-19	PAPI-Reg: Patch-to-Pixel Solution for Efficient Cross-Modal Registration between LiDAR Point Cloud and Camera Image	Yuanchao Yue et.al.	2503.15285	null
2025-04-07	Less Biased Noise Scale Estimation for Threshold-Robust RANSAC	Johan Edstedt et.al.	2503.13433	null
2025-03-17	SatDepth: A Novel Dataset for Satellite Image Matching	Rahul Deshmukh et.al.	2503.12706	link
2025-03-14	Refining Image Edge Detection via Linear Canonical Riesz Transforms	Shuhui Yang et.al.	2503.11148	null
2025-03-13	Speedy MASt3R	Jingxing Li et.al.	2503.10017	null
2025-03-11	Keypoint Detection and Description for Raw Bayer Images	Jiakai Lin et.al.	2503.08673	null
2025-03-06	Learning 3D Medical Image Models From Brain Functional Connectivity Network Supervision For Mental Disorder Diagnosis	Xingcan Hu et.al.	2503.04205	null
2025-03-07	Diff-Reg v2: Diffusion-Based Matching Matrix Estimation for Image Matching and 3D Registration	Qianliang Wu et.al.	2503.04127	null
2025-03-05	JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba	Xiaoyong Lu et.al.	2503.03437	null
2025-02-28	CNSv2: Probabilistic Correspondence Encoded Neural Image Servo	Anzhe Chen et.al.	2503.00132	null
2025-02-27	A2-GNN: Angle-Annular GNN for Visual Descriptor-free Camera Relocalization	Yejun Zhang et.al.	2502.20036	link
2025-02-27	RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges	Thibaut Loiseau et.al.	2502.19955	null
2025-02-26	BEV-LIO(LC): BEV Image Assisted LiDAR-Inertial Odometry with Loop Closure	Haoxin Cai et.al.	2502.19242	link
2025-02-25	PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching	Han Nie et.al.	2502.18104	link
2025-02-25	Improving Transformer Based Line Segment Detection with Matched Predicting and Re-ranking	Xin Tong et.al.	2502.17766	null
2025-03-04	Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model	Yaxuan Huang et.al.	2502.16779	null
2025-02-16	FeaKM: Robust Collaborative Perception under Noisy Pose Conditions	Jiuwu Hao et.al.	2502.11003	link
2025-02-24	Enhancing Ground-to-Aerial Image Matching for Visual Misinformation Detection Using Semantic Segmentation	Emanuele Mule et.al.	2502.06288	link
2025-02-04	Muographic Image Upsampling with Machine Learning for Built Infrastructure Applications	William O’Donnell et.al.	2502.02624	null
2025-02-01	MambaGlue: Fast and Robust Local Feature Matching With Mamba	Kihwan Ryoo et.al.	2502.00462	link
2025-01-24	Dense-SfM: Structure from Motion with Dense Consistent Matching	JongMin Lee et.al.	2501.14277	null
2025-01-20	MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching	Yepeng Liu et.al.	2501.11299	null
2025-01-13	MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training	Xingyi He et.al.	2501.07556	null
2025-01-13	Matching Free Depth Recovery from Structured Light	Zhuohang Yu et.al.	2501.07113	null
2025-01-02	Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views	Yulun Wu et.al.	2501.01196	null
2024-12-31	Towards Real-Time 2D Mapping: Harnessing Drones, AI, and Computer Vision for Advanced Insights	Bharath Kumar Agnur et.al.	2412.20210	null
2024-12-27	MINIMA: Modality Invariant Image Matching	Xingyu Jiang et.al.	2412.19412	link
2024-12-24	GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network	Xianfeng Song et.al.	2412.18221	link
2024-12-17	Bringing Multimodality to Amazon Visual Search System	Xinliang Zhu et.al.	2412.13364	null
2024-12-04	Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis	Siyoon Jin et.al.	2412.03150	null
2024-11-20	DT-LSD: Deformable Transformer-based Line Segment Detection	Sebastian Janampa et.al.	2411.13005	link
2024-11-15	Image Matching Filtering and Refinement by Planes and Beyond	Fabio Bellavia et.al.	2411.09484	link
2024-11-11	XPoint: A Self-Supervised Visual-State-Space based Architecture for Multispectral Image Registration	Ismail Can Yagmur et.al.	2411.07430	link
2024-11-07	The Impact of Semi-Supervised Learning on Line Segment Detection	Johanna Engman et.al.	2411.04596	link
2024-11-04	Silver medal Solution for Image Matching Challenge 2024	Yian Wang et.al.	2411.01851	null
2024-10-30	Variable Resolution Sampling and Deep Learning Image Recovery for Accelerated Multi-Spectral MRI Near Metal Implants	Azadeh Sharafi et.al.	2410.23329	null
2024-11-05	RelationBooth: Towards Relation-Aware Customized Object Generation	Qingyu Shi et.al.	2410.23280	null
2024-10-31	ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses	Junjie Ni et.al.	2410.22733	null
2024-10-30	LoFLAT: Local Feature Matching using Focused Linear Attention Transformer	Naijian Cao et.al.	2410.22710	null
2024-10-26	Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification	Yue Su et.al.	2410.20097	null
2024-10-01	A Robust Multisource Remote Sensing Image Matching Method Utilizing Attention and Feature Enhancement Against Noise Interference	Yuan Li et.al.	2410.11848	null
2024-10-15	LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images	Yuzhou Cheng et.al.	2410.11505	null
2024-10-12	Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence	Felipe Cadar et.al.	2410.09533	link
2024-09-27	Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras	Yipeng Lu et.al.	2409.18673	null
2024-09-25	Game4Loc: A UAV Geo-Localization Benchmark from Game Data	Yuxiang Ji et.al.	2409.16925	link
2024-09-24	Automatic Registration of SHG and H&E Images with Feature-based Initial Alignment and Intensity-based Instance Optimization: Contribution to the COMULIS Challenge	Marek Wodzinski et.al.	2409.15931	null
2024-09-10	Weakly-supervised Camera Localization by Ground-to-satellite Image Registration	Yujiao Shi et.al.	2409.06471	link
2024-09-05	Enabling Practical and Privacy-Preserving Image Processing	Chao Wang et.al.	2409.03568	null
2024-09-20	A General Albedo Recovery Approach for Aerial Photogrammetric Images through Inverse Rendering	Shuang Song et.al.	2409.03032	link
2024-08-29	Super-Resolution works for coastal simulations	Zhi-Song Liu et.al.	2408.16553	null
2024-09-15	Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks	Sierra Bonilla et.al.	2408.16445	link
2024-08-26	Affine steerers for structured keypoint description	Georg Bökman et.al.	2408.14186	link
2024-08-25	TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers	Chuanrui Zhang et.al.	2408.13770	null
2024-09-11	Coarse-to-fine Alignment Makes Better Speech-image Retrieval	Lifeng Zhou et.al.	2408.13119	null
2024-08-19	BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval	Zhenyu Lu et.al.	2408.10383	null
2024-08-14	RSD-DOG : A New Image Descriptor based on Second Order Derivatives	Darshan Venkatrayappa et.al.	2408.07687	null
2024-08-09	One Shot is Enough for Sequential Infrared Small Target Segmentation	Bingbing Dan et.al.	2408.04823	link
2024-08-07	PRISM: PRogressive dependency maxImization for Scale-invariant image Matching	Xudong Cai et.al.	2408.03598	null
2024-08-05	ConDL: Detector-Free Dense Image Matching	Monika Kwiatkowski et.al.	2408.02766	null
2024-08-04	Improving Neural Surface Reconstruction with Feature Priors from Multi-View Image	Xinlin Ren et.al.	2408.02079	link
2024-07-29	Image-text matching for large-scale book collections	Artemis Llabrés et.al.	2407.19812	link
2024-07-26	PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis	Sohyeong Kim et.al.	2407.18695	null
2024-07-22	RADA: Robust and Accurate Feature Learning with Domain Adaptation	Jingtai He et.al.	2407.15791	null
2024-07-17	GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection	Jingwen Yu et.al.	2407.11736	link
2024-07-16	REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching	Han Nie et.al.	2407.11637	link
2024-07-16	A Self-Correcting Strategy of the Digital Volume Correlation Displacement Field Based on Image Matching: Application to Poor Speckles Quality and Complex-Large Deformation	Chengsheng Li et.al.	2407.11287	null
2024-07-14	Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching	Xiaoyong Lu et.al.	2407.07789	null
2024-07-10	Mutual Information calculation on different appearances	Jiecheng Liao et.al.	2407.07410	null
2024-07-15	SfM on-the-fly: Get better 3D from What You Capture	Zongqian Zhan et.al.	2407.03939	null
2024-07-03	IMC 2024 Methods & Solutions Review	Shyam Gupta et.al.	2407.03172	null
2024-06-21	High Resolution Surface Reconstruction of Cultural Heritage Objects Using Shape from Polarization Method	F. S. Mortazavi et.al.	2406.15121	null
2024-06-16	Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models	Yikai Zhang et.al.	2406.10902	link
2024-06-14	Grounding Image Matching in 3D with MASt3R	Vincent Leroy et.al.	2406.09756	link

MutilModal

Publish Date	Title	Authors	PDF	Code
2025-06-26	Exploring the Design Space of 3D MLLMs for CT Report Generation	Mohammed Baharoon et.al.	2506.21535	null
2025-06-26	TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding	Junwen Zhang et.al.	2506.21393	null
2025-06-26	SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning	Melanie Rieff et.al.	2506.21355	null
2025-06-26	FairyGen: Storied Cartoon Video from a Single Child-Drawn Character	Jiayi Zheng et.al.	2506.21272	null
2025-06-26	Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents	Tianyi Men et.al.	2506.21252	null
2025-06-26	Task-Aware KV Compression For Cost-Effective Long Video Understanding	Minghao Qin et.al.	2506.21184	null
2025-06-26	OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography	Caoshuo Li et.al.	2506.21101	null
2025-06-26	V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling	Junwei You et.al.	2506.21041	null
2025-06-26	Evidence-based diagnostic reasoning with multi-agent copilot for human pathology	Chengkuan Chen et.al.	2506.20964	null
2025-06-26	E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs	Van-Hoang Phan et.al.	2506.20944	null
2025-06-25	UniCode $^2$ : Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation	Yanzhe Chen et.al.	2506.20214	null
2025-06-25	BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos	Jiahao Lin et.al.	2506.20103	null
2025-06-24	MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection	Zhengxiang Huang et.al.	2506.19884	null
2025-06-24	Multimodal large language models and physics visual tasks: comparative analysis of performance and costs	Giulia Polverini et.al.	2506.19662	null
2025-06-24	Surgery-R1: Advancing Surgical-VQLA with Reasoning Multimodal Large Language Model via Reinforcement Learning	Pengfei Hao et.al.	2506.19469	null
2025-06-24	Mem4Nav: Boosting Vision-and-Language Navigation in Urban Environments with a Hierarchical Spatial-Cognition Long-Short Memory System	Lixuan He et.al.	2506.19433	null
2025-06-24	Memory-Augmented Incomplete Multimodal Survival Prediction via Cross-Slide and Gene-Attentive Hypergraph Learning	Mingcheng Qu et.al.	2506.19324	null
2025-06-24	MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models	Yinan Xia et.al.	2506.19257	null
2025-06-24	Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification	Minghao Qin et.al.	2506.19225	null
2025-06-24	MedErr-CT: A Visual Question Answering Benchmark for Identifying and Correcting Errors in CT Reports	Sunggu Kyung et.al.	2506.19217	null
2025-06-23	MOSCARD – Causal Reasoning and De-confounding for Multimodal Opportunistic Screening of Cardiovascular Adverse Events	Jialu Pi et.al.	2506.19174	null
2025-06-23	Universal Video Temporal Grounding with Generative Multi-modal Large Language Models	Zeqian Li et.al.	2506.18883	null
2025-06-23	TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting	Zhongbin Guo et.al.	2506.18862	null
2025-06-23	SIM-Net: A Multimodal Fusion Network Using Inferred 3D Object Shape Point Clouds from RGB Images for 2D Classification	Youcef Sklab et.al.	2506.18683	null
2025-06-24	Object-aware Sound Source Localization via Audio-Visual Scene Understanding	Sung Jin Um et.al.	2506.18557	null
2025-06-23	MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis	Yuting Zhang et.al.	2506.18512	null
2025-06-23	Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey	Xinyao Li et.al.	2506.18504	null
2025-06-23	AViLA: Asynchronous Vision-Language Agent for Streaming Multimodal Data Interaction	Gengyuan Zhang et.al.	2506.18472	null
2025-06-23	What You Think Is What You Get: Bridge User Intent and Transfer Function Design through Multimodal Large Language Models	Yiyao Wang et.al.	2506.18407	null
2025-06-23	RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models	Yeongtak Oh et.al.	2506.18369	null
2025-06-24	Multimodal Fusion SLAM with Fourier Attention	Youjie Zhou et.al.	2506.18204	null
2025-06-20	MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models	Xiaolong Wang et.al.	2506.17046	null
2025-06-20	MM-AttacKG: A Multimodal Approach to Attack Graph Construction with Large Language Models	Yongheng Zhang et.al.	2506.16968	null
2025-06-20	Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs	Haoran Sun et.al.	2506.16962	null
2025-06-20	LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models	Fanfei Li et.al.	2506.16950	null
2025-06-20	Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning	Jiaqi Chen et.al.	2506.16931	null
2025-06-20	With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You	Fabian Gröger et.al.	2506.16895	null
2025-06-20	IsoNet: Causal Analysis of Multimodal Transformers for Neuromuscular Gesture Classification	Eion Tyacke et.al.	2506.16744	null
2025-06-19	How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?	Giuseppe Lando et.al.	2506.16450	null
2025-06-19	GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning	Yi Chen et.al.	2506.16141	null
2025-06-18	Demystifying the Visual Quality Paradox in Multimodal Large Language Models	Shuo Xing et.al.	2506.15645	null
2025-06-18	Creating User-steerable Projections with Interactive Semantic Mapping	Artur André Oliveira et.al.	2506.15479	null
2025-06-18	Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning	Chunlei Li et.al.	2506.15477	null
2025-06-18	Understanding GUI Agent Localization Biases through Logit Sharpness	Xingjian Tao et.al.	2506.15425	null
2025-06-18	MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering	Xinqi Fan et.al.	2506.15298	null
2025-06-18	From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem	Yanxu Mao et.al.	2506.15170	null
2025-06-17	ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM	Yujun Wang et.al.	2506.14766	null
2025-06-17	Exploring MLLMs Perception of Network Visualization Principles	Jacob Miller et.al.	2506.14611	null
2025-06-17	M2BeamLLM: Multimodal Sensing-empowered mmWave Beam Prediction with Large Language Models	Can Zheng et.al.	2506.14532	null
2025-06-17	LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops	Jiyuan Fu et.al.	2506.14493	null
2025-06-17	GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies	Jingqi Yang et.al.	2506.14477	link
2025-06-17	Dense360: Dense Understanding from Omnidirectional Panoramas	Yikang Zhou et.al.	2506.14471	null
2025-06-17	Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval	Ruofan Hu et.al.	2506.14445	null
2025-06-17	From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models	Xinyang Li et.al.	2506.14224	null
2025-06-17	A multi-stage augmented multimodal interaction network for fish feeding intensity quantification	Shulong Zhang et.al.	2506.14170	null
2025-06-17	SceneAware: Scene-Constrained Pedestrian Trajectory Prediction with LLM-Guided Walkability	Juho Bai et.al.	2506.14144	null
2025-06-16	Discrete Diffusion in Large Language and Multimodal Models: A Survey	Runpeng Yu et.al.	2506.13759	link
2025-06-16	TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning	Junru Zhang et.al.	2506.13705	null
2025-06-16	DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models	Yunnong Chen et.al.	2506.13663	null
2025-06-16	Omni-AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented for Efficient Long Video Understanding	Zhucun Xue et.al.	2506.13589	null
2025-06-16	RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis	Pengzuo Wu et.al.	2506.13405	null
2025-06-16	VIS-Shepherd: Constructing Critic for LLM-based Data Visualization Generation	Bo Pan et.al.	2506.13326	link
2025-06-16	ZINA: Multimodal Fine-grained Hallucination Detection and Editing	Yuiga Wada et.al.	2506.13130	null
2025-06-16	Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning	Haibo Qiu et.al.	2506.13056	null
2025-06-16	CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model	Jiangtong Li et.al.	2506.13055	null
2025-06-15	SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models	Xinyi Zhao et.al.	2506.12992	link
2025-06-16	VGR: Visual Grounded Reasoning	Jiacong Wang et.al.	2506.11991	null
2025-06-13	Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?	Simeon Junker et.al.	2506.11807	null
2025-06-13	Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization	Wenqi Liu et.al.	2506.11712	null
2025-06-13	Dynamic Mixture of Curriculum LoRA Experts for Continual Multimodal Instruction Tuning	Chendi Ge et.al.	2506.11672	null
2025-06-13	VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories?	Jiachen Yu et.al.	2506.11571	null
2025-06-13	DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs	Bo-Cheng Chiu et.al.	2506.11558	null
2025-06-13	Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models	Jinming Wen et.al.	2506.11521	null
2025-06-13	Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs	Xiao Xu et.al.	2506.11515	null
2025-06-13	Stop learning it all to mitigate visual hallucination, Focus on the hallucination target	Dokyoon Yoon et.al.	2506.11417	null
2025-06-12	Combining Log Data and Collaborative Dialogue Features to Predict Project Quality in Middle School AI Education	Conrad Borchers et.al.	2506.11326	null
2025-06-12	Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs	Qizhe Zhang et.al.	2506.10967	link
2025-06-12	Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?	Fei Lin et.al.	2506.10912	null
2025-06-12	VideoDeepResearch: Long Video Understanding With Agentic Tool Using	Huaying Yuan et.al.	2506.10821	link
2025-06-13	Scientists’ First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning	Yuhao Zhou et.al.	2506.10521	null
2025-06-12	MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models	Yu Huang et.al.	2506.10465	null
2025-06-12	Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts	Guowei Zhong et.al.	2506.10452	link
2025-06-12	MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment	Shuo wang et.al.	2506.10430	null
2025-06-12	Can Sound Replace Vision in LLaVA With Token Substitution?	Ali Vosoughi et.al.	2506.10416	null
2025-06-12	Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?	Yingjin Song et.al.	2506.10415	null
2025-06-12	Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series	Ching Chang et.al.	2506.10412	null
2025-06-11	OctoNav: Towards Generalist Embodied Navigation	Chen Gao et.al.	2506.09839	null
2025-06-11	MMME: A Spontaneous Multi-Modal Micro-Expression Dataset Enabling Visual-Physiological Fusion	Chuang Maa et.al.	2506.09834	link
2025-06-11	Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning	Yuting Li et.al.	2506.09736	link
2025-06-11	HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding	Yanzhao Shi et.al.	2506.09634	null
2025-06-11	AD^2-Bench: A Hierarchical CoT Benchmark for MLLM in Autonomous Driving under Adverse Conditions	Zhaoyang Wei et.al.	2506.09557	null
2025-06-10	BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models	Amina Mollaysa et.al.	2506.08936	null
2025-06-10	What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities	Wendong Bu et.al.	2506.08933	null
2025-06-10	Enhancing Synthetic CT from CBCT via Multimodal Fusion: A Study on the Impact of CBCT Quality and Alignment	Maximilian Tschuchnig et.al.	2506.08716	null
2025-06-10	From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge	Agnese Taluzzi et.al.	2506.08553	null
2025-06-09	Serendipitous Recommendation with Multimodal LLM	Haoting Wang et.al.	2506.08283	null
2025-06-09	Instruction-Tuned Video-Audio Models Elucidate Functional Specialization in the Brain	Subba Reddy Oota et.al.	2506.08277	link
2025-06-09	GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior	Penghao Wu et.al.	2506.08012	null
2025-06-09	Play to Generalize: Learning to Reason Through Game Play	Yunfei Xie et.al.	2506.08011	link
2025-06-09	CyberV: Cybernetics for Test-time Scaling in Video Understanding	Jiahao Meng et.al.	2506.07971	link
2025-06-09	SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence	Ziyang Gong et.al.	2506.07966	link
2025-06-09	WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning	Jie Yang et.al.	2506.07905	link
2025-06-09	PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement	Teng Hu et.al.	2506.07848	null
2025-06-09	HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains	Shijie Wang et.al.	2506.07837	link
2025-06-09	WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code	Zhiyu Lin et.al.	2506.07818	link
2025-06-09	Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests	Arnau Igualde Sáez et.al.	2506.07418	null
2025-06-08	Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification	Tianyi Bai et.al.	2506.07235	null
2025-06-06	DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation	Jingyu Xiao et.al.	2506.06251	link
2025-06-06	VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning	Zikang Wang et.al.	2506.06097	null
2025-06-06	MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?	Zhitao He et.al.	2506.06034	null
2025-06-06	Object Navigation with Structure-Semantic Reasoning-Based Multi-level Map and Multimodal Decision-Making LLM	Chongshang Yan et.al.	2506.05896	null
2025-06-06	Human-AI Alignment of Multimodal Large Language Models with Speech-Language Pathologists in Parent-Child Interactions	Weiyan Shi et.al.	2506.05879	null
2025-06-09	Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling	Yihan Xie et.al.	2506.05831	null
2025-06-06	Pts3D-LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models	Hugues Thomas et.al.	2506.05689	null
2025-06-05	MLLM-CL: Continual Learning for Multimodal Large Language Models	Hongbo Zhao et.al.	2506.05453	null
2025-06-05	SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs	Jiahui Wang et.al.	2506.05344	link
2025-06-05	AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs	Lidong Lu et.al.	2506.05328	null
2025-06-05	EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?	Yuqian Yuan et.al.	2506.05287	null
2025-06-05	MokA: Multimodal Low-Rank Adaptation for MLLMs	Yake Wei et.al.	2506.05191	null
2025-06-05	On the Comprehensibility of Multi-structured Financial Documents using LLMs and Pre-processing Tools	Shivani Upadhyay et.al.	2506.05182	link
2025-06-05	The NTNU System at the S&I Challenge 2025 SLA Open Track	Hong-Yun Lin et.al.	2506.05121	null
2025-06-05	FinMultiTime: A Four-Modal Bilingual Dataset for Financial Time-Series Analysis	Wenyan Xu et.al.	2506.05019	link
2025-06-05	TextVidBench: A Benchmark for Long Video Scene Text Understanding	Yangyang Zhong et.al.	2506.04983	null
2025-06-05	APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval	Hong Gao et.al.	2506.04953	null
2025-06-05	From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes	Tianxu Wang et.al.	2506.04897	null
2025-06-04	Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning	Shuang Chen et.al.	2506.04207	null
2025-06-04	MMR-V: What’s Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos	Kejian Zhu et.al.	2506.04141	null
2025-06-04	Multimodal Tabular Reasoning with Privileged Structured Information	Jun-Peng Jiang et.al.	2506.04088	null
2025-06-04	Vision Remember: Alleviating Visual Forgetting in Efficient MLLM with Vision Feature Resample	Ze Feng et.al.	2506.03928	null
2025-06-04	HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models	Zhaolu Kang et.al.	2506.03922	link
2025-06-04	ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning	Feng Han et.al.	2506.03596	link
2025-06-04	Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts	Jiaxing Zhang et.al.	2506.03591	null
2025-06-04	WIFE-Fusion:Wavelet-aware Intra-inter Frequency Enhancement for Multi-model Image Fusion	Tianpei Zhang et.al.	2506.03555	null
2025-06-05	Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos	Tanqiu Qiao et.al.	2506.03440	null
2025-06-03	A Multimodal, Multilingual, and Multidimensional Pipeline for Fine-grained Crowdsourcing Earthquake Damage Evaluation	Zihui Ma et.al.	2506.03360	link
2025-06-03	MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query	Wei Chow et.al.	2506.03144	null
2025-06-03	AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation	Lu Qiu et.al.	2506.03126	null
2025-06-03	Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models	Shizhan Gong et.al.	2506.02557	null
2025-06-03	VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning	Hao Yan et.al.	2506.02537	null
2025-06-03	Minos: A Multimodal Evaluation Model for Bidirectional Generation Between Image and Text	Junzhe Zhang et.al.	2506.02494	null
2025-06-02	From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models	Yihong Tang et.al.	2506.02242	null
2025-06-02	MLLMs Need 3D-Aware Representation Supervision for Scene Understanding	Xiaohu Huang et.al.	2506.01946	null
2025-06-02	Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency	Hongyu Li et.al.	2506.01908	link
2025-06-02	MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs	Wayner Barrios et.al.	2506.01850	null
2025-06-02	FaceCoT: A Benchmark Dataset for Face Anti-Spoofing with Chain-of-Thought Reasoning	Honglu Zhang et.al.	2506.01783	null
2025-05-30	Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents	Yaxin Luo et.al.	2505.24878	null
2025-05-30	MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning	Yiqing Liang et.al.	2505.24871	null
2025-05-30	SiLVR: A Simple Language-based Video Reasoning Framework	Ce Zhang et.al.	2505.24869	link
2025-05-30	FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation	Junyu Luo et.al.	2505.24714	link
2025-05-30	Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors	Duo Zheng et.al.	2505.24625	null
2025-05-30	Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts	Xin He et.al.	2505.24541	null
2025-05-30	Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model	Yuting Zhang et.al.	2505.24476	link
2025-05-30	SORCE: Small Object Retrieval in Complex Environments	Chunxu Liu et.al.	2505.24441	link
2025-05-30	KEVER^2: Knowledge-Enhanced Visual Emotion Reasoning and Retrieval	Fanhang Man et.al.	2505.24342	null
2025-06-02	MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM	Bowen Dong et.al.	2505.24238	null
2025-05-29	Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought	Yunze Man et.al.	2505.23766	null
2025-05-29	MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence	Sihan Yang et.al.	2505.23764	null
2025-05-29	Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence	Diankun Wu et.al.	2505.23747	null
2025-05-29	PixelThink: Towards Efficient Chain-of-Pixel Reasoning	Song Wang et.al.	2505.23727	null
2025-05-29	VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos	Tingyu Song et.al.	2505.23693	link
2025-05-29	Human Empathy as Encoder: AI-Assisted Depression Assessment in Special Education	Boning Zhao et.al.	2505.23631	null
2025-05-29	A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis	Shengyuan Liu et.al.	2505.23601	null
2025-05-29	MAPLE: A Mobile Assistant with Persistent Finite State Machines for Recovery Reasoning	Linqiang Guo et.al.	2505.23596	null
2025-05-29	Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles	Zifu Wang et.al.	2505.23590	link
2025-05-29	OmniEarth-Bench: Towards Holistic Evaluation of Earth’s Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data	Fengxiang Wang et.al.	2505.23522	null
2025-05-28	Spatial Knowledge Graph-Guided Multimodal Synthesis	Yida Xue et.al.	2505.22633	null
2025-05-28	RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction	Yuchi Wang et.al.	2505.22613	null
2025-05-28	Multi-MLLM Knowledge Distillation for Out-of-Context News Detection	Yimeng Gu et.al.	2505.22517	null
2025-05-28	A Closer Look at Multimodal Representation Collapse	Abhra Chaudhuri et.al.	2505.22483	null
2025-05-28	Fostering Video Reasoning via Next-Event Prediction	Haonan Wang et.al.	2505.22457	null
2025-05-28	Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO	Lai Wei et.al.	2505.22453	link
2025-05-28	Privacy-preserving Prompt Personalization in Federated Learning for Multimodal Large Language Models	Sizai Hou et.al.	2505.22447	null
2025-05-28	Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs	Xudong Li et.al.	2505.22396	null
2025-05-28	Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start	Lai Wei et.al.	2505.22334	link
2025-05-28	CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction	Jiali Chen et.al.	2505.22304	null
2025-05-27	UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents	Han Xiao et.al.	2505.21496	link
2025-05-27	Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment	Xiaojun Jia et.al.	2505.21494	link
2025-05-27	Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO	Muzhi Zhu et.al.	2505.21457	null
2025-05-27	AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs	Xuanwen Ding et.al.	2505.21389	link
2025-05-27	Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?	Junhao Cheng et.al.	2505.21374	link
2025-05-27	MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios	Yang Shi et.al.	2505.21333	null
2025-05-27	MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs	Jiakang Yuan et.al.	2505.21327	null
2025-05-27	SOLIDGEO: Measuring Multimodal Spatial Math Reasoning in Solid Geometry	Peijie Wang et.al.	2505.21177	null
2025-05-27	IKMo: Image-Keyframed Motion Generation with Trajectory-Pose Conditioned Motion Diffusion Model	Yang Zhao et.al.	2505.21146	null
2025-05-27	Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts	Yue Zhang et.al.	2505.21079	null
2025-05-27	MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents	Ziming Wei et.al.	2505.20148	link
2025-05-26	FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities	Jin Wang et.al.	2505.20147	null
2025-05-26	Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion	Zheqi Lv et.al.	2505.20053	link
2025-05-26	Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)	Subba Reddy Oota et.al.	2505.20029	link
2025-05-26	ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving	Xueyi Liu et.al.	2505.20024	link
2025-05-26	NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID	Shihao Li et.al.	2505.20001	null
2025-05-27	Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM	Peng Liu et.al.	2505.19901	null
2025-05-26	Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging	Yongxian Wei et.al.	2505.19892	link
2025-05-26	Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought	Chao Huang et.al.	2505.19877	link
2025-05-26	Efficient Multi-modal Long Context Learning for Training-free Adaptation	Zehong Ma et.al.	2505.19812	link
2025-05-23	Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling	Bryan Wong et.al.	2505.17982	null
2025-05-23	T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation	Zi-Ao Ma et.al.	2505.17897	null
2025-05-23	Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities	Ziwei Zhou et.al.	2505.17862	link
2025-05-23	Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM	Donghwan Chi et.al.	2505.17726	null
2025-05-23	HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning	Chuhao Zhou et.al.	2505.17645	null
2025-05-23	RoHyDR: Robust Hybrid Diffusion Recovery for Incomplete Multimodal Emotion Recognition	Yuehan Jin et.al.	2505.17501	null
2025-05-23	The Coherence Trap: When MLLM-Crafted Narratives Exploit Manipulated Visual Contexts	Yuchen Zhang et.al.	2505.17476	null
2025-05-23	FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain	Suifeng Zhao et.al.	2505.17471	null
2025-05-23	FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow	Haoyu Sun et.al.	2505.17399	link
2025-05-23	Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts	Seon Gyeom Kim et.al.	2505.17374	null
2025-05-22	GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning	Chengqi Duan et.al.	2505.17022	link
2025-05-22	Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework	Chenhao Zhang et.al.	2505.17019	link
2025-05-22	SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward	Kaixuan Fan et.al.	2505.17018	link
2025-05-22	Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models	Runsen Xu et.al.	2505.17015	null
2025-05-22	SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding	Haoning Wu et.al.	2505.17012	link
2025-05-22	LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning	Zebin You et.al.	2505.16933	null
2025-05-22	Backdoor Cleaning without External Guidance in MLLM Fine-tuning	Xuankun Rong et.al.	2505.16916	link
2025-05-22	GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent	Bin Xie et.al.	2505.16827	link
2025-05-22	Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs	Zeping Yu et.al.	2505.16703	null
2025-05-22	R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO	Huanjin Yao et.al.	2505.16673	link
2025-05-20	UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation	Rui Tian et.al.	2505.14682	null
2025-05-20	Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities	Parthasaarathy Sudarsanam et.al.	2505.14562	null
2025-05-20	ModRWKV: Transformer Multimodality in Linear Time	Jiale Kang et.al.	2505.14505	link
2025-05-20	Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents	Pengzhou Cheng et.al.	2505.14418	null
2025-05-20	ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations	Xuecheng Wu et.al.	2505.14404	null
2025-05-20	TF-Mamba: Text-enhanced Fusion Mamba with Missing Modalities for Robust Multimodal Sentiment Analysis	Xiang Li et.al.	2505.14329	link
2025-05-20	Speculative Decoding Reimagined for Multimodal Large Language Models	Luxi Lin et.al.	2505.14260	link
2025-05-20	UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning	Sule Bai et.al.	2505.14231	null
2025-05-20	Towards Omnidirectional Reasoning with 360-R1: A Dataset, Benchmark, and GRPO-based Method	Xinshen Zhang et.al.	2505.14197	null
2025-05-20	Texts or Images? A Fine-grained Analysis on the Effectiveness of Input Representations and Models for Table Question Answering	Wei Zhou et.al.	2505.14131	null
2025-05-19	MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision	Lingxiao Du et.al.	2505.13427	link
2025-05-19	FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning	Zhuozhao Hu et.al.	2505.13419	link
2025-05-19	MR. Judge: Multimodal Reasoner as a Judge	Renjie Pi et.al.	2505.13403	null
2025-05-19	MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers	Kyeongman Park et.al.	2505.13082	null
2025-05-19	Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning	Xiaoyu Yang et.al.	2505.13081	null
2025-05-19	Advancing Sequential Numerical Prediction in Autoregressive Models	Xiang Fei et.al.	2505.13077	link
2025-05-19	FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models	Hengxing Cai et.al.	2505.12835	link
2025-05-19	Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering	Jianfeng Cai et.al.	2505.12826	null
2025-05-19	Correspondence of high-dimensional emotion structures elicited by video clips between humans and Multimodal LLMs	Haruka Asanuma et.al.	2505.12746	null
2025-05-19	Shadow-FT: Tuning Instruct via Base	Taiqiang Wu et.al.	2505.12716	link
2025-05-16	GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art	Chenkai Zhang et.al.	2505.11436	link
2025-05-16	Visual Planning: Let’s Think Only with Images	Yi Xu et.al.	2505.11409	link
2025-05-16	EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models	Bohao Xing et.al.	2505.11405	link
2025-05-19	TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs	Pengju Xu et.al.	2505.11275	link
2025-05-16	A Step towards Interpretable Multimodal AI Models with MultiFIX	Mafalda Malafaia et.al.	2505.11262	null
2025-05-16	CompAlign: Improving Compositional Text-to-Image Generation with a Complex Benchmark and Fine-Grained Feedback	Yixin Wan et.al.	2505.11178	null
2025-05-16	Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans	Yansheng Qiu et.al.	2505.11141	null
2025-05-16	WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?	An-Lan Wang et.al.	2505.11015	null
2025-05-16	ToDMA: Large Model-Driven Token-Domain Multiple Access for Semantic Communications	Li Qiao et.al.	2505.10946	null
2025-05-16	VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross-Modal Mutual Information Maximization	Mingxiao Li et.al.	2505.10917	null
2025-05-15	Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis	Pengfei Wang et.al.	2505.10541	link
2025-05-15	Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence	Xiang He et.al.	2505.10176	link
2025-05-15	Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering	Yangfu Li et.al.	2505.10118	null
2025-05-15	CartoAgent: a multimodal large language model-powered multi-agent cartographic framework for map style transfer and evaluation	Chenglong Wang et.al.	2505.09936	null
2025-05-15	UICopilot: Automating UI Synthesis via Hierarchical Code Generation from Webpage Designs	Yi Gui et.al.	2505.09904	link
2025-05-14	A Multimodal Multi-Agent Framework for Radiology Report Generation	Ziruo Yi et.al.	2505.09787	null
2025-05-14	FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models	Hongyang Wang et.al.	2505.09415	null
2025-05-14	Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping	Yinuo Wang et.al.	2505.09252	link
2025-05-14	AMSnet 2.0: A Large AMS Database with AI Segmentation for Net Detection	Yichen Shi et.al.	2505.09155	null
2025-05-13	Multimodal Fusion of Glucose Monitoring and Food Imagery for Caloric Content Prediction	Adarsh Kumar et.al.	2505.09018	null
2025-05-14	Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology	Yatai Ji et.al.	2505.08765	null
2025-05-12	Visually Interpretable Subtask Reasoning for Visual Question Answering	Yu Cheng et.al.	2505.08084	null
2025-05-12	MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing	Aybora Koksal et.al.	2505.07984	null
2025-05-12	Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach	Ruikun Hou et.al.	2505.07902	null
2025-05-12	Multimodal Survival Modeling in the Age of Foundation Models	Steven Song et.al.	2505.07683	link
2025-05-12	Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning	Xiaokun Wang et.al.	2505.07263	null
2025-05-11	DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models	Shucheng Huang et.al.	2505.07084	link
2025-05-11	ParaView-MCP: An Autonomous Visualization Agent with Direct Tool Use	Shusen Liu et.al.	2505.07064	null
2025-05-11	MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception	Zhengye Zhang et.al.	2505.07007	link
2025-05-11	Visual Evolutionary Optimization on Combinatorial Problems with Multimodal Large Language Models: A Case Study of Influence Maximization	Jie Zhao et.al.	2505.06850	null
2025-05-11	Visual Instruction Tuning with Chain of Region-of-Interest	Yixin Chen et.al.	2505.06840	null
2025-05-09	Is your multimodal large language model a good science tutor?	Ming Liu et.al.	2505.06418	null
2025-05-09	NSF-MAP: Neurosymbolic Multimodal Fusion for Robust and Interpretable Anomaly Prediction in Assembly Pipelines	Chathurangi Shyalika et.al.	2505.06333	link
2025-05-09	MonetGPT: Solving Puzzles Enhances MLLMs’ Image Retouching Skills	Niladri Shekhar Dutt et.al.	2505.06176	null
2025-05-09	The Application of Deep Learning for Lymph Node Segmentation: A Systematic Review	Jingguo Qu et.al.	2505.06118	null
2025-05-09	ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding	Shuai Wang et.al.	2505.06020	null
2025-05-09	BMMDetect: A Multimodal Deep Learning Framework for Comprehensive Biomedical Misconduct Detection	Yize Zhou et.al.	2505.05763	null
2025-05-08	Fine-Tuning Video-Text Contrastive Model for Primate Behavior Retrieval from Unlabeled Raw Videos	Giulio Cesare Mastrocinque Santo et.al.	2505.05681	null
2025-05-08	Looking Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models	Aarti Ghatkesar et.al.	2505.05626	null
2025-05-08	Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding	Han Xiao et.al.	2505.05446	link
2025-05-09	EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation	Biao Yi et.al.	2505.05440	null
2025-05-08	Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization	Sooyoung Park et.al.	2505.05343	link
2025-05-08	PADriver: Towards Personalized Autonomous Driving	Genghua Kou et.al.	2505.05240	null
2025-05-08	X-Driver: Explainable Autonomous Driving with Vision-Language Models	Wei Liu et.al.	2505.05098	null
2025-05-08	Learning Item Representations Directly from Multimodal Features for Effective Recommendation	Xin Zhou et.al.	2505.04960	link
2025-05-07	EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning	Zhenghao Xing et.al.	2505.04623	link
2025-05-07	On Path to Multimodal Generalist: General-Level and General-Bench	Hao Fei et.al.	2505.04620	null
2025-05-07	M2Rec: Multi-scale Mamba for Efficient Sequential Recommendation	Qianru Zhang et.al.	2505.04445	null
2025-05-06	VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model	Zuwei Long et.al.	2505.03739	link
2025-05-06	Multi-Agent System for Comprehensive Soccer Understanding	Jiayuan Rao et.al.	2505.03735	null
2025-05-06	RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration	Huajie Tan et.al.	2505.03673	link
2025-05-06	ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant	Yifan Xiang et.al.	2505.03654	link
2025-05-06	LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs	Xinyuan Zhang et.al.	2505.03460	null
2025-05-06	Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant	Haonan Wang et.al.	2505.03380	null
2025-05-05	R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning	Yi-Fan Zhang et.al.	2505.02835	link
2025-05-06	MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation	Mingcheng Li et.al.	2505.02648	null
2025-05-05	SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning	Jinpeng Chen et.al.	2505.02486	link
2025-05-07	Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction	Inclusion AI et.al.	2505.02471	link
2025-05-05	Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection	Sungheon Jeong et.al.	2505.02393	link
2025-05-04	Retrieval-augmented in-context learning for multimodal large language models in disease classification	Zaifu Zhan et.al.	2505.02087	null
2025-05-06	RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video	Shuhang Xun et.al.	2505.02064	link
2025-05-04	R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation	Meng-Hao Guo et.al.	2505.02018	null
2025-05-04	MLLM-Enhanced Face Forgery Detection: A Vision-Language Fusion Solution	Siran Peng et.al.	2505.02013	null
2025-05-02	VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos	Zongxia Li et.al.	2505.01481	link
2025-05-02	FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors	Chenxi Li et.al.	2505.01322	null
2025-05-02	Multimodal Transformers are Hierarchical Modal-wise Heterogeneous Graphs	Yijie Jin et.al.	2505.01068	null
2025-05-02	Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs	Hari Chandana Kuchibhotla et.al.	2505.01064	null
2025-05-01	Multi-Modal Language Models as Text-to-Image Model Evaluators	Jiahui Chen et.al.	2505.00759	null
2025-05-01	InstructAttribute: Fine-grained Object Attributes editing with Instruction	Xingxi Yin et.al.	2505.00751	null
2025-05-01	A Methodological and Structural Review of Parkinsons Disease Detection Across Diverse Data Modalities	Abu Saleh Musa Miah et.al.	2505.00525	null
2025-05-01	Toward Automated Regulatory Decision-Making: Trustworthy Medical Device Risk Classification with Multimodal Transformers and Self-Training	Yu Han et.al.	2505.00422	null
2025-04-30	Audo-Sight: Enabling Ambient Interaction For Blind And Visually Impaired Individuals	Bhanuja Ainary et.al.	2505.00153	null
2025-04-30	GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling	Siqi Li et.al.	2505.00063	null
2025-04-30	COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning	Xindi Wu et.al.	2504.21850	null
2025-04-30	Visual Text Processing: A Comprehensive Review and Unified Evaluation	Yan Shu et.al.	2504.21682	link
2025-04-30	Rethinking Visual Layer Selection in Multimodal LLMs	Haoran Chen et.al.	2504.21447	null
2025-04-30	SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding	Chenkai Zhang et.al.	2504.21435	link
2025-04-30	Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing	Hong Zhang et.al.	2504.21356	link
2025-04-30	UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation	Linshan Wu et.al.	2504.21336	link
2025-04-30	Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models	Guanghao Zhou et.al.	2504.21277	null
2025-04-29	ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification	Ziqing Fan et.al.	2504.20930	link
2025-04-29	AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation	Jeongsoo Choi et.al.	2504.20629	null
2025-04-29	A Summary on GUI Agents with Foundation Models Enhanced by Reinforcement Learning	Jiahao Li et.al.	2504.20464	null
2025-04-29	APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech	Zhicheng Lian et.al.	2504.20447	null
2025-04-29	MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation	Amaan Izhar et.al.	2504.20343	link
2025-04-28	A Transformer-based Multimodal Fusion Model for Efficient Crowd Counting Using Visual and Wireless Signals	Zhe Cui et.al.	2504.20178	null
2025-04-28	CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback	Chenhan Jiang et.al.	2504.19860	null
2025-04-28	SRMF: A Data Augmentation and Multimodal Fusion Approach for Long-Tail UHR Satellite Image Segmentation	Yulong Guo et.al.	2504.19839	null
2025-04-28	DEEMO: De-identity Multimodal Emotion Recognition and Reasoning	Deng Li et.al.	2504.19549	null
2025-04-28	LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning	Peijian Zeng et.al.	2504.19524	null
2025-04-26	Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation	Delun Lai et.al.	2504.19002	null
2025-04-26	Advancing Face-to-Face Emotion Communication: A Multimodal Dataset (AFFEC)	Meisam J. Sekiavandi et.al.	2504.18969	link
2025-04-26	Feature Fusion Revisited: Multimodal CTR Prediction for MMCTR Challenge	Junjie Zhou et.al.	2504.18961	link
2025-04-25	Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization	Kesen Zhao et.al.	2504.18397	link
2025-04-25	ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding	Yi-Xing Peng et.al.	2504.18152	null
2025-04-25	DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models	Jianyu Liu et.al.	2504.18053	link
2025-04-27	Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models	Xu Ma et.al.	2504.17789	null
2025-04-24	Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs	Tiancheng Gu et.al.	2504.17432	null
2025-04-25	TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation	Ling You et.al.	2504.17365	null
2025-04-24	V $^2$ R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations	Zhiyuan Fan et.al.	2504.16727	null
2025-04-24	Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark	Hanlei Zhang et.al.	2504.16427	link
2025-04-23	EEmo-Bench: A Benchmark for Multi-modal Large Language Models on Image Evoked Emotion Assessment	Lancheng Gao et.al.	2504.16405	null
2025-04-22	Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs	Merve Cerit et.al.	2504.16323	link
2025-04-21	Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends	Mohammad Abu Tami et.al.	2504.16134	null
2025-04-22	TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving	Daocheng Fu et.al.	2504.15780	null
2025-04-22	FaceInsight: A Multimodal Large Language Model for Face Perception	Jingzhi Li et.al.	2504.15624	null
2025-04-22	AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization	Jinda Lu et.al.	2504.15619	null
2025-04-21	IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs	David Ma et.al.	2504.15415	link
2025-04-21	Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs	Chun-Hsiao Yeh et.al.	2504.15280	link
2025-04-21	VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models	Weiye Xu et.al.	2504.15279	null
2025-04-21	A Call for New Recipes to Enhance Spatial Reasoning in MLLMs	Huanyu Zhang et.al.	2504.15037	null
2025-04-21	IoT-AMLHP: Aligned Multimodal Learning of Header-Payload Representations for Resource-Efficient Malicious IoT Traffic Classification	Fengyuan Nie et.al.	2504.14833	null
2025-04-20	Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens	Kaihang Pan et.al.	2504.14666	null
2025-04-20	Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension	Lin Li et.al.	2504.14642	null
2025-04-20	Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction	Wenke Xia et.al.	2504.14588	link
2025-04-19	Towards Explainable Fake Image Detection with Multi-Modal Large Language Models	Yikun Ji et.al.	2504.14245	link
2025-04-19	InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners	Yuhang Liu et.al.	2504.14239	link
2025-04-18	Scaling LLaNA: Advancing NeRF-Language Understanding Through Large-Scale Training	Andrea Amaduzzi et.al.	2504.13995	null
2025-04-18	Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing	Joowon Kim et.al.	2504.13490	null
2025-04-17	SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Haoxuan Li et.al.	2504.13172	null
2025-04-17	Hadamard product in deep learning: Introduction, Advances and Challenges	Grigorios G Chrysos et.al.	2504.13112	null
2025-04-17	EventVAD: Training-Free Event-Aware Video Anomaly Detection	Yihua Shao et.al.	2504.13092	null
2025-04-18	SkyReels-V2: Infinite-length Film Generative Model	Guibin Chen et.al.	2504.13074	link
2025-04-17	ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images	Sangwook Kim et.al.	2504.13023	null
2025-04-17	EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery	Wei Zhang et.al.	2504.12795	null
2025-04-17	Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration	Yicheng Pan et.al.	2504.12773	link
2025-04-17	SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding	Qianqian Sun et.al.	2504.12704	null
2025-04-17	GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning	Liangyu Xu et.al.	2504.12597	null
2025-04-16	Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis	Shravan Chaudhari et.al.	2504.12511	null
2025-04-16	Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis	Miaosen Luo et.al.	2504.12151	null
2025-04-16	Instruction-augmented Multimodal Alignment for Image-Text and Element Matching	Xinli Yue et.al.	2504.12018	null
2025-04-16	AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection	Yuhao Chao et.al.	2504.11914	null
2025-04-16	Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation	Julia Kreutzer et.al.	2504.11829	null
2025-04-15	DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis	Efthymios Georgiou et.al.	2504.11082	null
2025-04-15	Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation	Yan Rong et.al.	2504.11002	null
2025-04-14	CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates	Ankit Kumar Shaw et.al.	2504.10738	null
2025-04-14	Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization	Darryl Hannan et.al.	2504.10727	null
2025-04-14	Relation-Rich Visual Document Generator for Visual Information Extraction	Zi-Han Jiang et.al.	2504.10659	link
2025-04-15	InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models	Jinguo Zhu et.al.	2504.10479	link
2025-04-14	Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding	Tao Zhang et.al.	2504.10465	link
2025-04-14	The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer	Weixian Lei et.al.	2504.10462	link
2025-04-14	FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos	Rui Chen et.al.	2504.10358	null
2025-04-14	CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation	Junchen Fu et.al.	2504.10307	link
2025-04-14	PRM-BAS: Enhancing Multimodal Reasoning through PRM-guided Beam Annealing Search	Pengfei Hu et.al.	2504.10222	null
2025-04-14	The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance	Anwesha Mohanty et.al.	2504.10179	null
2025-04-14	COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts	Jiansheng Li et.al.	2504.10158	null
2025-04-14	CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography	I-Sheng Fang et.al.	2504.10090	null
2025-04-15	MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework	Zihan Ling et.al.	2504.10074	null
2025-04-11	Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images	Boyang Deng et.al.	2504.08727	null
2025-04-10	POEM: Precise Object-level Editing via MLLM control	Marco Schouten et.al.	2504.08111	null
2025-04-10	GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation	Lang Lin et.al.	2504.07962	null
2025-04-10	MM-IFEngine: Towards Multimodal Instruction Following	Shengyuan Ding et.al.	2504.07957	link
2025-04-10	Perception-R1: Pioneering Perception Policy with Reinforcement Learning	En Yu et.al.	2504.07954	link
2025-04-10	MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation	Nico Catalano et.al.	2504.07942	null
2025-04-10	VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding	Henghao Zhao et.al.	2504.07519	null
2025-04-10	How Can Objects Help Video-Language Understanding?	Zitian Tang et.al.	2504.07454	null
2025-04-10	Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing	Chenxi Sun et.al.	2504.07424	null
2025-04-10	Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction	Kyoyun Choi et.al.	2504.07415	null
2025-04-09	Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning	Ashutosh Chaubey et.al.	2504.07198	null
2025-04-10	VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning	Xinhao Li et.al.	2504.06958	null
2025-04-09	MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Chang Nie et.al.	2504.06863	null
2025-04-09	Integrating Cognitive Processing Signals into Language Models: A Review of Advances, Applications and Future Directions	Angela Lopez-Cardona et.al.	2504.06843	null
2025-04-09	Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception	Ruotian Peng et.al.	2504.06666	null
2025-04-09	Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program	Minghe Gao et.al.	2504.06606	link
2025-04-08	Mind the Gap: Evaluating Vision Systems in Small Data Applications	Samuel Stevens et.al.	2504.06486	link
2025-04-08	Transfer between Modalities with MetaQueries	Xichen Pan et.al.	2504.06256	null
2025-04-08	V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models	Xiangxi Zheng et.al.	2504.06148	link
2025-04-08	MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models	Pengfei Zhou et.al.	2504.05782	link
2025-04-08	On the Suitability of Reinforcement Fine-Tuning to Visual Tasks	Xiaxu Chen et.al.	2504.05682	null
2025-04-07	URECA: Unique Region Caption Anything	Sangbeom Lim et.al.	2504.05305	null
2025-04-07	LiveVQA: Live Visual Knowledge Seeking	Mingyang Fu et.al.	2504.05288	null
2025-04-07	Explaining Low Perception Model Competency with High-Competency Counterfactuals	Sara Pohland et.al.	2504.05254	null
2025-04-07	Towards Visual Text Grounding of Multimodal Large Language Model	Ming Li et.al.	2504.04974	null
2025-04-07	Video-Bench: Human-Aligned Video Generation Benchmark	Hui Han et.al.	2504.04907	null
2025-04-07	OrderChain: A General Prompting Paradigm to Improve Ordinal Understanding Ability of MLLM	Jinhong Wang et.al.	2504.04801	null
2025-04-07	OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance	Chaoyi Wang et.al.	2504.04781	null
2025-04-07	Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data	Samarth Mishra et.al.	2504.04740	link
2025-04-07	LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts	Yimu Wang et.al.	2504.04653	null
2025-04-06	Advancing Egocentric Video Question Answering with Multimodal Large Language Models	Alkesh Patel et.al.	2504.04550	null
2025-04-04	MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models	Wulin Xie et.al.	2504.03641	null
2025-04-03	Hummus: A Dataset of Humorous Multimodal Metaphor Use	Xiaoyu Tong et.al.	2504.02983	link
2025-04-03	Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning	Zhihan Zhang et.al.	2504.02906	link
2025-04-03	Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision	Xiaofeng Han et.al.	2504.02477	null
2025-04-03	The Plot Thickens: Quantitative Part-by-Part Exploration of MLLM Visualization Literacy	Matheus Valentim et.al.	2504.02217	null
2025-04-03	ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement	Runhui Huang et.al.	2504.01934	null
2025-04-02	Spatial-R1: Enhancing MLLMs in Video Spatial Reasoning	Kun Ouyang et.al.	2504.01805	link
2025-04-02	PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$ de Contextualization	Aofan Liu et.al.	2504.01444	null
2025-04-02	Slow-Fast Architecture for Video Multi-Modal Large Language Models	Min Shi et.al.	2504.01328	link
2025-04-01	AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction	Junhao Cheng et.al.	2504.01014	link
2025-04-01	IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval	Bangwei Liu et.al.	2504.00954	null
2025-04-02	Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning	Ram Ramrakhya et.al.	2504.00907	null
2025-04-01	Improved Visual-Spatial Reasoning via R1-Zero-Like Training	Zhenyi Liao et.al.	2504.00883	null
2025-04-01	Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights	Yuchen Liu et.al.	2504.00839	null
2025-04-01	QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA	Shuai Li et.al.	2504.00654	null
2025-03-31	Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation	Shengqiong Wu et.al.	2503.24379	null
2025-03-31	Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1	Yi Chen et.al.	2503.24376	link
2025-03-31	H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding	Qi Wu et.al.	2503.24008	null
2025-03-31	BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation	Yumeng Fu et.al.	2503.23990	null
2025-03-31	Boosting MLLM Reasoning with Text-Debiased Hint-GRPO	Qihan Huang et.al.	2503.23905	null
2025-04-01	Evaluating small vision-language models as AI assistants for radio astronomical source analysis tasks	S. Riggi et.al.	2503.23859	link
2025-03-31	OrchMLLM: Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training	Yijie Zheng et.al.	2503.23830	null
2025-03-31	XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?	Fengxiang Wang et.al.	2503.23771	null
2025-03-31	STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?	Yun Li et.al.	2503.23765	null
2025-03-31	AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization	Yiyang Du et.al.	2503.23733	link
2025-03-28	Q-Insight: Understanding Image Quality via Visual Reinforcement Learning	Weiqi Li et.al.	2503.22679	link
2025-03-28	Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users	Antonia Karamolegkou et.al.	2503.22610	null
2025-03-28	NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving	Fuhao Li et.al.	2503.22436	null
2025-03-31	Agent-Centric Personalized Multiple Clustering with Multi-Modal LLMs	Ziye Chen et.al.	2503.22241	null
2025-03-28	Learning to Instruct for Visual Instruction Tuning	Zhihan Zhou et.al.	2503.22215	null
2025-03-28	DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos	Yunming Liang et.al.	2503.22208	null
2025-03-28	EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos	Yuxuan Li et.al.	2503.22152	link
2025-03-28	Tokenization of Gaze Data	Tim Rolff et.al.	2503.22145	null
2025-03-28	A Survey on Remote Sensing Foundation Models: From Vision to Multimodality	Ziyue Huang et.al.	2503.22081	link
2025-03-27	Video-R1: Reinforcing Video Reasoning in MLLMs	Kaituo Feng et.al.	2503.21776	link
2025-03-27	3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models	Yuhan Zhang et.al.	2503.21745	null
2025-03-27	UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning	Zhengxi Lu et.al.	2503.21620	link
2025-03-27	FusionSegReID: Advancing Person Re-Identification with Multimodal Retrieval and Precise Segmentation	Jincheng Yan et.al.	2503.21595	null
2025-03-27	FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs	Xiaoqin Wang et.al.	2503.21457	link
2025-03-27	InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression	Dongchen Lu et.al.	2503.21307	link
2025-03-26	ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction	Yiqiao Jin et.al.	2503.20978	null
2025-03-26	MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams	Yanpeng Sun et.al.	2503.20745	null
2025-03-26	Vision as LoRA	Han Wang et.al.	2503.20680	link
2025-03-26	Beyond Intermediate States: Explaining Visual Redundancy through Language	Dingchen Yang et.al.	2503.20540	link
2025-03-26	Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering	Zehui Liao et.al.	2503.20504	null
2025-03-26	MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning	Yiwei Ma et.al.	2503.20502	null
2025-03-26	From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment	Yucheng Suo et.al.	2503.20472	null
2025-03-26	MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation	Rongyu Zhang et.al.	2503.20384	null
2025-03-26	Dynamic Pyramid Network for Efficient Multimodal Large Language Model	Hao Ai et.al.	2503.20322	null
2025-03-26	Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs	Zitian Wang et.al.	2503.20309	null
2025-03-25	LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?	Kexian Tang et.al.	2503.19990	null
2025-03-25	CoLLM: A Large Language Model for Composed Image Retrieval	Chuong Huynh et.al.	2503.19910	link
2025-03-25	Scaling Vision Pre-Training to 4K Resolution	Baifeng Shi et.al.	2503.19903	null
2025-03-25	Perception-Enhanced Multitask Multimodal Semantic Communication for UAV-Assisted Integrated Sensing and Communication System	Ziji Guo et.al.	2503.19594	null
2025-03-25	DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts	Ling Zhong et.al.	2503.19498	null
2025-03-25	ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning	Jiaqi Liao et.al.	2503.19312	null
2025-03-24	MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks	Wenhao You et.al.	2503.19134	null
2025-03-24	LLaVAction: evaluating and training multi-modal large language models for action recognition	Shaokai Ye et.al.	2503.18712	link
2025-03-25	Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models	Yazhou Zhang et.al.	2503.18681	null
2025-03-24	Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark	Bingchen Miao et.al.	2503.18665	link
2025-03-24	Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding	Xiangrui Liu et.al.	2503.18478	null
2025-03-24	A Simple yet Effective Layout Token in Large Language Models for Document Understanding	Zhaoqing Zhu et.al.	2503.18434	null
2025-03-23	Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering	Zixin Chen et.al.	2503.18172	null
2025-03-23	MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation	Jiaxin Huang et.al.	2503.18135	null
2025-03-23	MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection	Yibo Yan et.al.	2503.18132	null
2025-03-23	Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models	Qiao Liang et.al.	2503.18034	null
2025-03-22	4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding	Wenxuan Zhu et.al.	2503.17827	link
2025-03-21	LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models	Jian Liang et.al.	2503.16843	null
2025-03-21	When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts	Jun Seong Kim et.al.	2503.16826	null
2025-03-20	Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions	Hadi Amini et.al.	2503.16585	link
2025-03-20	OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence	Long Yuan et.al.	2503.16326	null
2025-03-20	Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data	Zijian Li et.al.	2503.16260	null
2025-03-20	CLS-RL: Image Classification with Rule-Based Reinforcement Learning	Ming Li et.al.	2503.16188	link
2025-03-20	OThink-MR1: Stimulating multimodal generalized reasoning capabilities through dynamic reinforcement learning	Zhiyuan Liu et.al.	2503.16081	null
2025-03-20	Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models	Zhihang Liu et.al.	2503.16036	link
2025-03-20	BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models	Zenghui Yuan et.al.	2503.16023	null
2025-03-20	DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering	Haochen Wang et.al.	2503.15887	null
2025-03-20	A Vision Centric Remote Sensing Benchmark	Abduljaleel Adejumo et.al.	2503.15816	null
2025-03-19	LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning	Federico Cocchi et.al.	2503.15621	link
2025-03-19	Visual Position Prompt for MLLM based Visual Grounding	Wei Tang et.al.	2503.15426	link
2025-03-19	Leveraging Perfect Multimodal Alignment and Gaussian Assumptions for Cross-modal Transfer	Abhi Kamboj et.al.	2503.15352	null
2025-03-19	LEGION: Learning to Ground and Explain for Synthetic Image Detection	Hengrui Kang et.al.	2503.15264	null
2025-03-20	Benchmarking Large Language Models for Handwritten Text Recognition	Giorgia Crosilla et.al.	2503.15195	null
2025-03-19	UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation	Qihui Zhang et.al.	2503.14941	null
2025-03-19	VisNumBench: Evaluating Number Sense of Multimodal Large Language Models	Tengjin Weng et.al.	2503.14939	null
2025-03-19	FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding	Chongjun Tu et.al.	2503.14935	null
2025-03-19	POSTA: A Go-to Framework for Customized Artistic Poster Generation	Haoyu Chen et.al.	2503.14908	null
2025-03-19	Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations	Shuo Li et.al.	2503.14895	null
2025-03-18	Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives	Sara Sarto et.al.	2503.14604	link
2025-03-18	Aligning Multimodal LLM with Human Preference: A Survey	Tao Yu et.al.	2503.14504	link
2025-03-19	Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM	Xinyu Fang et.al.	2503.14478	link
2025-03-18	VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation	Shoubin Yu et.al.	2503.14350	null
2025-03-19	DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies	Wei Song et.al.	2503.14324	link
2025-03-18	Towards Harmless Multimodal Assistants with Blind Preference Optimization	Yongqi Li et.al.	2503.14189	null
2025-03-18	Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding	Zining Wang et.al.	2503.14140	null
2025-03-18	MP-GUI: Modality Perception with MLLMs for GUI Understanding	Ziwei Wang et.al.	2503.14021	link
2025-03-18	SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability	Jiankang Wang et.al.	2503.13983	null
2025-03-18	Survey of Adversarial Robustness in Multimodal Large Language Models	Chengze Jiang et.al.	2503.13962	null
2025-03-18	Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation	Sayak Nag et.al.	2503.13947	null
2025-03-17	MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research	James Burgess et.al.	2503.13399	link
2025-03-17	Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning	Mengyao Lyu et.al.	2503.13383	null
2025-03-17	Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning	Hai-Long Sun et.al.	2503.13360	null
2025-03-17	3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o	Dingning Liu et.al.	2503.13185	null
2025-03-17	MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs	Erik Daxberger et.al.	2503.13111	null
2025-03-17	Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference	Hao Yin et.al.	2503.13108	link
2025-03-17	ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models	Hao Yin et.al.	2503.13107	link
2025-03-17	Mitigating Cross-Modal Distraction and Ensuring Geometric Feasibility via Affordance-Guided, Self-Consistent MLLMs for Food Preparation Task Planning	Yu-Hong Shen et.al.	2503.13055	null
2025-03-17	Efficient Motion-Aware Video MLLM	Zijia Zhao et.al.	2503.13016	null
2025-03-17	HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model	Haiyang Guo et.al.	2503.12941	null
2025-03-14	VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity	Jing Bi et.al.	2503.11557	null
2025-03-14	A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving	Tin Stribor Sohn et.al.	2503.11400	null
2025-03-14	Cornstarch: Distributed Multimodal Training Must Be Multimodality-Aware	Insu Jang et.al.	2503.11367	link
2025-03-14	Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space	Weichen Zhan et.al.	2503.11094	link
2025-03-14	EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks	Yi Zhang et.al.	2503.11089	null
2025-03-14	BannerAgency: Advertising Banner Design with Multimodal LLM Agents	Heng Wang et.al.	2503.11060	null
2025-03-14	RONA: Pragmatically Diverse Image Captioning with Coherence Relations	Aashish Anantha Ramakrishnan et.al.	2503.10997	link
2025-03-13	Learning to Inference Adaptively for Multimodal Large Language Models	Zhuoyan Xu et.al.	2503.10905	null
2025-03-13	PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models	Zilu Guo et.al.	2503.10529	null
2025-03-13	Interactive Multimodal Fusion with Temporal Modeling	Jun Yu et.al.	2503.10523	null
2025-03-13	TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models	Xudong Tan et.al.	2503.10501	link
2025-03-13	4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models	Wanhua Li et.al.	2503.10437	link
2025-03-13	CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance	Yufan Deng et.al.	2503.10391	null
2025-03-13	A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection	Heng Yim Nicole Oo et.al.	2503.10371	null
2025-03-13	IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification	Yuhao Wang et.al.	2503.10324	null
2025-03-13	VisualPRM: An Effective Process Reward Model for Multimodal Reasoning	Weiyun Wang et.al.	2503.10291	null
2025-03-13	LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents	Boyu Chen et.al.	2503.10200	null
2025-03-13	Hybrid Agents for Image Restoration	Bingchen Li et.al.	2503.10120	null
2025-03-13	BIMBA: Selective-Scan Compression for Long-Range Video Question Answering	Md Mohaiminul Islam et.al.	2503.09590	link
2025-03-12	Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding	Haoyu Zhang et.al.	2503.09143	null
2025-03-11	Seeing What’s Not There: Spurious Correlation in Multimodal LLMs	Parsa Hosseini et.al.	2503.08884	null
2025-03-11	Language-Depth Navigated Thermal and Visible Image Fusion	Jinchang Zhang et.al.	2503.08676	null
2025-03-11	SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories	Muzhi Zhu et.al.	2503.08625	link
2025-03-11	LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization	Xianfeng Wu et.al.	2503.08619	link
2025-03-11	HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding	Shehreen Azad et.al.	2503.08585	null
2025-03-11	RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding	Xichen Tan et.al.	2503.08576	null
2025-03-11	FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework	Jianian Zhu et.al.	2503.08461	null
2025-03-11	KAP: MLLM-assisted OCR Text Enhancement for Hybrid Retrieval in Chinese Non-Narrative Documents	Hsin-Ling Hsu et.al.	2503.08452	link
2025-03-11	Embodied Crowd Counting	Runling Long et.al.	2503.08367	null
2025-03-12	Attention Reallocation: Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs	Chongjun Tu et.al.	2503.08342	null
2025-03-11	Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework	Zhuo Zhi et.al.	2503.08308	null
2025-03-10	Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts	Shiu-hong Kao et.al.	2503.07503	null
2025-03-10	LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?	Bangyan Li et.al.	2503.07487	null
2025-03-10	REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding	Yan Tai et.al.	2503.07413	link
2025-03-10	ALLVB: All-in-One Long Video Understanding Benchmark	Xichen Tan et.al.	2503.07298	null
2025-03-10	A Novel Ophthalmic Benchmark for Evaluating Multimodal Large Language Models with Fundus Photographs and OCT Images	Xiaoyi Liang et.al.	2503.07094	null
2025-03-10	Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning	Jiazheng Liu et.al.	2503.07002	null
2025-03-10	Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs	Wenzhuo Xu et.al.	2503.06989	null
2025-03-10	Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition	Xinyu Xi et.al.	2503.06978	null
2025-03-10	ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks	Yan Yang et.al.	2503.06885	null
2025-03-09	SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation	Zisheng Chen et.al.	2503.06764	link
2025-03-11	Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models	Wenxuan Huang et.al.	2503.06749	link
2025-03-07	Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information	Junbo Zhao et.al.	2503.05543	null
2025-03-07	Can Large Language Models Grasp Concepts in Visual Content? A Case Study on YouTube Shorts about Depression	Jiaying “Lizzy” Liu et.al.	2503.05109	null
2025-03-06	FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement	Ian Huang et.al.	2503.04919	null
2025-03-06	Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model	Wenke Huang et.al.	2503.04543	link
2025-03-06	Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition	Bin Chen et.al.	2503.04201	null
2025-03-06	MASTER: Multimodal Segmentation with Text Prompts	Fuyang Liu et.al.	2503.04199	null
2025-03-06	Biological Sequence with Language Model Prompting: A Survey	Jiyue Jiang et.al.	2503.04135	null
2025-03-07	Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts	Xiangnan Chen et.al.	2503.04095	null
2025-03-06	RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models	Wenhui Zhu et.al.	2503.03987	null
2025-03-05	DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance	Zhao Yang et.al.	2503.03689	link
2025-03-05	BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation	Hiep Truong Cong et.al.	2503.03280	null
2025-03-05	COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence	Wentao Li et.al.	2503.03215	null
2025-03-05	Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings	Sneh Pillai et.al.	2503.03202	null
2025-03-04	Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs	Wei-Yao Wang et.al.	2503.02597	link
2025-03-05	MCiteBench: A Benchmark for Multimodal Citation Text Generation in MLLMs	Caiyu Hu et.al.	2503.02589	link
2025-03-04	A Token-level Text Image Foundation Model for Document Understanding	Tongkun Guan et.al.	2503.02304	null
2025-03-03	Distilled Prompt Learning for Incomplete Multimodal Survival Prediction	Yingxue Xu et.al.	2503.01653	null
2025-03-03	RemiHaven: Integrating “In-Town” and “Out-of-Town” Peers to Provide Personalized Reminiscence Support for Older Drifters	Xuechen Zhang et.al.	2503.01358	null
2025-03-04	UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface	Hao Tang et.al.	2503.01342	link
2025-03-03	Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG	Wenbin Wang et.al.	2503.01222	link
2025-03-03	Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models	Tianjie Ju et.al.	2503.01208	link
2025-03-03	Scientific Reasoning: Assessment of Multimodal Generative LLMs	Florian Dreyer et.al.	2503.01064	null
2025-03-02	LLM-Fusion: A Novel Multimodal Fusion Model for Accelerated Material Discovery	Onur Boyar et.al.	2503.01022	null
2025-02-28	Adaptive Keyframe Sampling for Long Video Understanding	Xi Tang et.al.	2502.21271	null
2025-02-28	RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete	Yuheng Ji et.al.	2502.21257	null
2025-02-28	Fine-Grained Retrieval-Augmented Generation for Visual Question Answering	Zhengxuan Zhang et.al.	2502.20964	null
2025-02-28	HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models	Xiao Wang et.al.	2502.20811	null
2025-03-03	MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts	Peijie Wang et.al.	2502.20808	null
2025-02-28	Towards General Visual-Linguistic Face Forgery Detection(V2)	Ke Sun et.al.	2502.20698	link
2025-02-27	Visual Reasoning at Urban Intersections: FineTuning GPT-4o for Traffic Conflict Detection	Sari Masri et.al.	2502.20573	null
2025-02-27	Protecting multimodal large language models against misleading visualizations	Jonathan Tonglet et.al.	2502.20503	link
2025-02-27	VideoA11y: Method and Dataset for Accessible Video Description	Chaoyu Li et.al.	2502.20480	null
2025-02-27	Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription	Benjamin Gutteridge et.al.	2502.20295	link
2025-02-27	Mixture of Experts for Recognizing Depression from Interview and Reading Tasks	Loukas Ilias et.al.	2502.20213	null
2025-02-27	New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration	Xuzheng Yang et.al.	2502.20104	null
2025-02-27	AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs	Xuyang Wei et.al.	2502.20035	link
2025-02-27	Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up	Lang Huang et.al.	2502.20008	null
2025-02-27	Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents	Zhenyu Liu et.al.	2502.19917	link
2025-02-27	Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy	Zaijing Li et.al.	2502.19902	null
2025-02-27	Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention	Weiyan Shi et.al.	2502.19877	null
2025-02-27	One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion	Chunyang Cheng et.al.	2502.19854	link
2025-02-27	Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack	Chenhe Gu et.al.	2502.19672	null
2025-02-26	ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models	Danae Sánchez Villegas et.al.	2502.19409	null
2025-02-26	M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance	Qingpei Guo et.al.	2502.18778	null
2025-02-25	OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference	Xiangyu Zhao et.al.	2502.18411	link
2025-02-25	ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis	Li Lei et.al.	2502.18180	null
2025-02-25	VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion	Pei Liu et.al.	2502.18042	null
2025-02-25	MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks	Hyeonjeong Ha et.al.	2502.17832	link
2025-02-25	Can Multimodal LLMs Perform Time Series Anomaly Detection?	Xiongxiao Xu et.al.	2502.17812	link
2025-02-24	MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference	Zhongwei Wan et.al.	2502.17599	link
2025-02-24	PosterSum: A Multimodal Benchmark for Scientific Poster Summarization	Rohit Saxena et.al.	2502.17540	link
2025-02-24	Introducing Visual Perception Token into Multimodal Large Language Model	Runpeng Yu et.al.	2502.17425	link
2025-02-24	MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs	Jiarui Zhang et.al.	2502.17422	link
2025-02-24	HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization	Zhenghao Liu et.al.	2502.17315	link
2025-02-24	Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts	Zhenghao Liu et.al.	2502.17297	link
2025-02-24	Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence	Wenzhe Yin et.al.	2502.17028	null
2025-02-24	Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs	Himanshu Beniwal et.al.	2502.16901	link
2025-02-24	SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding	Liangtao Shi et.al.	2502.16786	link
2025-02-23	AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation	Rui Li et.al.	2502.16680	link
2025-02-23	Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries	Yin Wu et.al.	2502.16636	link
2025-02-23	Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review	Pei Fu et.al.	2502.16586	null
2025-02-21	Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models	Anirudh Sundar et.al.	2502.15639	null
2025-02-21	Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs	Gengyuan Zhang et.al.	2502.15457	null
2025-02-21	Research advances on fish feeding behavior recognition and intensity quantification methods in aquaculture	Shulong Zhang et.al.	2502.15311	null
2025-02-21	M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment	Chuan Cui et.al.	2502.15167	link
2025-02-20	Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation	Yun-Wei Chu et.al.	2502.15040	null
2025-02-20	Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework	Yuming Yang et.al.	2502.14864	link
2025-02-20	Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension	Amir Hossein Yari et.al.	2502.14315	null
2025-02-20	Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach	Yurong Wu et.al.	2502.14285	null
2025-02-21	PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC	Haowei Liu et.al.	2502.14282	null
2025-02-19	ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities	Chanjin Zheng et.al.	2502.13832	link
2025-02-19	From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education	Yi-Fan Zhang et.al.	2502.13789	null
2025-02-18	Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation	Bencheng Liao et.al.	2502.13145	link
2025-02-18	SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models	Xianfu Cheng et.al.	2502.13059	null
2025-02-18	AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks	Yurun Chen et.al.	2502.13053	null
2025-02-18	Towards Text-Image Interleaved Retrieval	Xin Zhang et.al.	2502.12799	link
2025-02-18	Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning	Yunhao Gou et.al.	2502.12635	null
2025-02-18	SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings	Weikai Lu et.al.	2502.12562	link
2025-02-18	MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos	Huaying Yuan et.al.	2502.12558	null
2025-02-18	SAFEERASER: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning	Junkai Chen et.al.	2502.12520	null
2025-02-17	HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation	Ling Yang et.al.	2502.12148	link
2025-02-17	PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection	Jinhe Bi et.al.	2502.12119	null
2025-02-17	Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications	Li Qiao et.al.	2502.12096	null
2025-02-17	Unhackable Temporal Rewarding for Scalable Video MLLMs	En Yu et.al.	2502.12081	null
2025-02-17	GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs	Yi Fang et.al.	2502.11925	null
2025-02-17	EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models	Jiamin Su et.al.	2502.11916	link
2025-02-17	MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation	Haochen Xue et.al.	2502.11903	null
2025-02-17	Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Hanbin Wang et.al.	2502.11829	link
2025-02-17	Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning	Yuqi Pang et.al.	2502.11751	link
2025-02-17	Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent	Junda Wu et.al.	2502.11740	null
2025-02-14	MM-RLHF: The Next Step Forward in Multimodal LLM Alignment	Yi-Fan Zhang et.al.	2502.10391	null
2025-02-14	AutoS $^2$ earch: Unlocking the Reasoning Potential of Large Models for Web-based Source Search	Zhengqiu Zhu et.al.	2502.09913	null
2025-02-13	EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents	Rui Yang et.al.	2502.09560	null
2025-02-13	A Benchmark for Crime Surveillance Video Analysis with Large Models	Haoran Chen et.al.	2502.09325	null
2025-02-13	From Visuals to Vocabulary: Establishing Equivalence Between Image and Text Token Through Autoregressive Pre-training in MLLMs	Mingxiao Li et.al.	2502.09093	null
2025-02-12	FixDrive: Automatically Repairing Autonomous Vehicle Driving Behaviour for $0.08 per Violation	Yang Sun et.al.	2502.08260	link
2025-02-12	Learning Human Skill Generators at Key-Step Levels	Yilu Wu et.al.	2502.08234	null
2025-02-13	Universal Adversarial Attack on Aligned Multimodal LLMs	Temurbek Rahmatullaev et.al.	2502.07987	null
2025-02-11	DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities	Chashi Mahiul Islam et.al.	2502.07905	null
2025-02-11	Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models	Jiacong Xu et.al.	2502.07601	null
2025-02-11	MLLM4PUE: Toward Universal Embeddings in Computational Pathology through Multimodal LLMs	Qifeng Zhou et.al.	2502.07221	null
2025-02-11	Early Risk Prediction of Pediatric Cardiac Arrest from Electronic Health Records via Multimodal Fused Transformer	Jiaying Lu et.al.	2502.07158	null
2025-02-09	AI-Driven HSI: Multimodality, Fusion, Challenges, and the Deep Learning Revolution	David S. Bhatti et.al.	2502.06894	null
2025-02-11	CoS: Chain-of-Shot Prompting for Long Video Understanding	Jian Hu et.al.	2502.06428	null
2025-02-07	Survey on AI-Generated Media Detection: From Non-MLLM to MLLM	Yueying Zou et.al.	2502.05240	null
2025-02-07	Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray	Yunhang Shen et.al.	2502.05177	link
2025-02-07	Multitwine: Multi-Object Compositing with Text and Layout Control	Gemma Canet Tarrés et.al.	2502.05165	null
2025-02-07	Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs	Rohit Saxena et.al.	2502.05092	null
2025-02-07	Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark	Han Zhang et.al.	2502.04976	null
2025-02-07	Cached Multi-Lora Composition for Multi-Concept Image Generation	Xiandong Zou et.al.	2502.04923	link
2025-02-07	MedMimic: Physician-Inspired Multimodal Fusion for Early Diagnosis of Fever of Unknown Origin	Minrui Chen et.al.	2502.04794	null
2025-02-06	EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models	He Hu et.al.	2502.04424	null
2025-02-05	PerPO: Perceptual Preference Optimization via Discriminative Rewarding	Zining Zhu et.al.	2502.04371	link
2025-02-06	PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?	Mennatullah Siam et.al.	2502.04192	link
2025-02-06	MRAMG-Bench: A BeyondText Benchmark for Multimodal Retrieval-Augmented Multimodal Generation	Qinhan Yu et.al.	2502.04176	link
2025-02-05	Large Language Models Are Universal Recommendation Learners	Junguang Jiang et.al.	2502.03041	null
2025-02-05	Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning	Yibo Yan et.al.	2502.02871	null
2025-02-04	SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency	Qianhao Yuan et.al.	2502.02458	link
2025-02-04	Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment	Yaling Shen et.al.	2502.02438	null
2025-02-06	LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models	Tzu-Tao Chang et.al.	2502.02406	null
2025-02-04	Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking	Jinyang Wu et.al.	2502.02339	null
2025-02-04	Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration	Younan Zhu et.al.	2502.01969	null
2025-02-04	MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving	Shiju Zhao et.al.	2502.01960	null
2025-02-04	DAMO: Data- and Model-aware Alignment of Multi-modal LLMs	Jinda Lu et.al.	2502.01943	link
2025-02-03	Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models	Hashmat Shadab Malik et.al.	2502.01576	link
2025-02-03	Position: Empowering Time Series Reasoning with Multimodal LLMs	Yaxuan Kong et.al.	2502.01477	null
2025-02-03	Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models	Mingi Jung et.al.	2502.01419	null
2025-01-31	Efficient Reasoning with Hidden Thinking	Xuan Shen et.al.	2501.19201	link
2025-01-31	Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs	Hongliang Li et.al.	2501.19036	null
2025-01-31	Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation	Bin Zhu et.al.	2501.19017	null
2025-01-30	BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos	Lehao Lin et.al.	2501.18565	null
2025-01-29	Generative AI for Vision: A Comprehensive Study of Frameworks and Applications	Fouad Bousetouane et.al.	2501.18033	null
2025-01-29	Topological Signatures of Adversaries in Multimodal Alignments	Minh Vu et.al.	2501.18006	null
2025-01-30	Leveraging Multimodal LLM for Inspirational User Interface Search	Seokhyeon Park et.al.	2501.17799	link
2025-01-29	Learning Free Token Reduction for Multi-Modal LLM	Zihui Zhao et.al.	2501.17391	null
2025-01-31	Multimodal Magic Elevating Depression Detection with a Fusion of Text and Audio Intelligence	Lindy Gan et.al.	2501.16813	null
2025-01-28	Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding	Yun Li et.al.	2501.16786	null
2025-01-28	MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark	Dongyi Yi et.al.	2501.16688	null
2025-01-28	CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs	Jinlan Fu et.al.	2501.16629	link
2025-01-27	AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models	Zheng Lian et.al.	2501.16566	link
2025-01-27	LUCY: Linguistic Understanding and Control Yielding Early Stage of Her	Heting Gao et.al.	2501.16327	link
2025-01-27	FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers	Renshan Zhang et.al.	2501.16297	null
2025-01-27	Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models	Jing Zhang et.al.	2501.16282	null
2025-01-27	Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection?	Zhiling Chen et.al.	2501.15795	null
2025-01-27	Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning	Michael Xieyang Liu et.al.	2501.15727	null
2025-01-26	Ocean-OCR: Towards General OCR Application via a Vision-Language Model	Song Chen et.al.	2501.15558	link
2025-01-26	Unveiling the Potential of Multimodal Retrieval Augmented Generation with Planning	Xiaohan Yu et.al.	2501.15470	null
2025-01-26	Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations	Zijun Long et.al.	2501.15379	null
2025-01-26	Baichuan-Omni-1.5 Technical Report	Yadong Li et.al.	2501.15368	link
2025-01-25	Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink	Yining Wang et.al.	2501.15269	null
2025-01-23	Pilot: Building the Federated Multimodal Instruction Tuning Framework	Baochen Xiong et.al.	2501.13985	null
2025-01-23	GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration	Yue Fan et.al.	2501.13896	null
2025-01-23	EventVL: Understand Event Streams via Multimodal Large Language Model	Pengteng Li et.al.	2501.13707	null
2025-01-23	LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models	Yizheng Sun et.al.	2501.13652	null
2025-01-23	ReasVQA: Advancing VideoQA with Imperfect Reasoning Process	Jianxin Liang et.al.	2501.13536	null
2025-01-23	50 Shades of Deceptive Patterns: A Unified Taxonomy, Multimodal Detection, and Security Implications	Zewei Shi et.al.	2501.13351	link
2025-01-24	Multi-aspect Knowledge Distillation with Large Language Model	Taegyeong Lee et.al.	2501.13341	link
2025-01-22	Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning	Bohao Yang et.al.	2501.13042	link
2025-01-22	InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	Yi Wang et.al.	2501.12386	link
2025-01-21	VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model	Xianwei Zhuang et.al.	2501.12327	link
2025-01-21	Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization	Jie Zhao et.al.	2501.11968	null
2025-01-21	EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents	Zhili Cheng et.al.	2501.11858	link
2025-01-20	Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution	Zhiyuan You et.al.	2501.11561	null
2025-01-20	EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Guankun Wang et.al.	2501.11347	link
2025-01-20	ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction	Xiangyang Hu et.al.	2501.11276	link
2025-01-20	A Survey of World Models for Autonomous Driving	Tuo Feng et.al.	2501.11260	link
2025-01-19	Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation	Zhengwen Shen et.al.	2501.10958	null
2025-01-18	Visual RAG: Expanding MLLM visual knowledge without fine-tuning	Mirco Bonomo et.al.	2501.10834	null
2025-01-17	FaceXBench: Evaluating Multimodal LLMs on Face Understanding	Kartik Narayan et.al.	2501.10360	link
2025-01-16	A Simple Aerial Detection Baseline of Multimodal Language Models	Qingyun Li et.al.	2501.09720	link
2025-01-16	Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis	Qize Yang et.al.	2501.09502	null
2025-01-16	Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics	Yuanyuan Wei et.al.	2501.09218	null
2025-01-15	Multimodal LLMs Can Reason about Aesthetics in Zero-Shot	Ruixiang Jiang et.al.	2501.09012	link
2025-01-15	The Devil is in Temporal Token: High Quality Video Reasoning Segmentation	Sitong Gong et.al.	2501.08549	link
2025-01-14	LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding	Hongyu Li et.al.	2501.08282	link
2025-01-14	Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness	Jiaxing Zhao et.al.	2501.07978	link
2025-01-14	Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models	Yifang Xu et.al.	2501.07972	null
2025-01-14	3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Haomiao Xiong et.al.	2501.07819	link
2025-01-13	Imagine while Reasoning in Space: Multimodal Visualization-of-Thought	Chengzu Li et.al.	2501.07542	null
2025-01-13	Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method	Wenping Jin et.al.	2501.07496	link
2025-01-13	Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation	Han Liu et.al.	2501.07110	link
2025-01-13	LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models	Mozhgan Nasr Azadani et.al.	2501.06986	link
2025-01-12	X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding	Wenqi Zhou et.al.	2501.06835	null
2025-01-12	GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing	Ruizhe Ou et.al.	2501.06828	null
2025-01-12	MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection	Kaiying Yan et.al.	2501.06764	null
2025-01-12	Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints	Ming Dai et.al.	2501.06710	link
2025-01-11	ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation	Xuanle Zhao et.al.	2501.06598	link
2025-01-11	Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs	Shan Zhang et.al.	2501.06430	link
2025-01-10	PEACE: Empowering Geologic Map Holistic Understanding with MLLMs	Yangyu Huang et.al.	2501.06184	null
2025-01-10	Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal LLMs	Dabing Cheng et.al.	2501.05884	null
2025-01-10	Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models	You Li et.al.	2501.05767	null
2025-01-10	TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos	Korawat Charoenpitaks et.al.	2501.05733	link
2025-01-09	MECASA: Motor Execution Classification using Additive Self-Attention for Hybrid EEG-fNIRS Data	Gourav Siddhad et.al.	2501.05525	null
2025-01-09	Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark	Yunzhuo Hao et.al.	2501.05444	link
2025-01-09	Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration	Xuyang Liu et.al.	2501.05179	link
2025-01-09	Optimizing Multitask Industrial Processes with Predictive Action Guidance	Naval Kishore Mehta et.al.	2501.05108	null
2025-01-09	DriVLM: Domain Adaptation of Vision-Language Models in Autonomous Driving	Xuran Zheng et.al.	2501.05081	null
2025-01-09	Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency	Shiji Zhao et.al.	2501.04931	null
2025-01-08	Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs	Yikang Zhou et.al.	2501.04670	link
2025-01-08	InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection	Yuhang Liu et.al.	2501.04575	link
2025-01-08	Evidence-based multimodal fusion on structured EHRs and free-text notes for ICU outcome prediction	Yucheng Ruan et.al.	2501.04389	link
2025-01-08	Multimodal Graph Constrastive Learning and Prompt for ChartQA	Yue Dai et.al.	2501.04303	null
2025-01-08	H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving	Siran Chen et.al.	2501.04302	null
2025-01-07	RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance	Matin Mortaheb et.al.	2501.03995	null
2025-01-06	Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches	Alhassan Mumuni et.al.	2501.03151	null
2025-01-07	Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild	Wanpeng Hu et.al.	2501.02964	link
2025-01-06	A Novel Vision Transformer for Camera-LiDAR Fusion based Traffic Object Segmentation	Toomas Tahves et.al.	2501.02858	null
2025-01-06	Ultrasound-QBench: Can LLMs Aid in Quality Assessment of Ultrasound Imaging?	Hongyi Miao et.al.	2501.02751	null
2025-01-05	FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance	Haicheng Wang et.al.	2501.02430	link
2025-01-04	What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph	Yutao Jiang et.al.	2501.02268	link
2025-01-03	AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs	Sanjoy Chowdhury et.al.	2501.02135	null
2025-01-03	VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction	Chaoyou Fu et.al.	2501.01957	link
2025-01-03	Virgo: A Preliminary Exploration on Reproducing o1-like MLLM	Yifan Du et.al.	2501.01904	link
2025-01-03	Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models	Guosheng Zhang et.al.	2501.01720	null
2025-01-02	Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants	Lixiong Qin et.al.	2501.01243	null
2025-01-02	Towards Interactive Deepfake Analysis	Lixiong Qin et.al.	2501.01164	link
2025-01-02	EliGen: Entity-Level Controlled Image Generation with Regional Attention	Hong Zhang et.al.	2501.01097	link
2025-01-02	Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs	Linhao Huang et.al.	2501.01042	null
2025-01-01	Decoding the Flow: CauseMotion for Emotional Causality Analysis in Long-form Conversations	Yuxuan Zhang et.al.	2501.00778	null
2024-12-31	Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method	Zhenpeng Huang et.al.	2501.00584	null
2024-12-31	VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling	Xinhao Li et.al.	2501.00574	link
2024-12-31	Fine-grained Video-Text Retrieval: A New Benchmark and Method	Yifan Xu et.al.	2501.00513	null
2024-12-31	Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion	Hebin Wang et.al.	2501.00330	null
2024-12-31	MLLM-as-a-Judge for Image Safety without Human Labeling	Zhenting Wang et.al.	2501.00192	null
2024-12-30	GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models	Shangyu Xing et.al.	2412.21036	null
2024-12-30	Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering	Junxiao Xue et.al.	2412.20927	null
2024-12-28	ST $^3$ : Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming	Jiedong Zhuang et.al.	2412.20105	null
2024-12-28	On the Compositional Generalization of Multimodal LLMs for Medical Imaging	Zhenyang Cai et.al.	2412.20070	link
2024-12-27	Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework	Jiang Liu et.al.	2412.19684	null
2024-12-27	CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs	Siyu Wang et.al.	2412.19663	null
2024-12-27	MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	Jiaqi Fan et.al.	2412.19406	link
2024-12-26	Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment	Ziang Yan et.al.	2412.19326	link
2024-12-26	Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries	Roberto Amoroso et.al.	2412.19304	null
2024-12-26	SeaMo: A Multi-Seasonal and Multimodal Remote Sensing Foundation Model	Xuyang Li et.al.	2412.19237	null
2024-12-25	MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models	Kaiwen Zuo et.al.	2412.18947	null
2024-12-25	RapGuard: Safeguarding Multimodal Large Language Models via Rationale-aware Defensive Prompting	Yilei Jiang et.al.	2412.18826	null
2024-12-24	Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation	Faraz Waseem et.al.	2412.18688	null
2024-12-24	MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning	Abdelmadjid Chergui et.al.	2412.18437	link
2024-12-24	Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles	Zihan Wang et.al.	2412.18416	null
2024-12-24	Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search	Huanjin Yao et.al.	2412.18319	link
2024-12-24	ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation	Mengyang Wu et.al.	2412.18216	link
2024-12-24	Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation	Yucong Luo et.al.	2412.18176	null
2024-12-24	VisionLLM-based Multimodal Fusion Network for Glottic Carcinoma Early Detection	Zhaohui Jin et.al.	2412.18124	null
2024-12-24	Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach	Jing Bi et.al.	2412.18108	null
2024-12-24	An Ensemble Approach to Short-form Video Quality Assessment Using Multimodal LLM	Wen Wen et.al.	2412.18060	null
2024-12-23	A Multimodal Fusion Framework for Bridge Defect Detection with Cross-Verification	Ravi Datta Rachuri et.al.	2412.17968	null
2024-12-23	Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy	Priyaranjan Pattnayak et.al.	2412.17759	null
2024-12-23	HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data	Ting Zhou et.al.	2412.17574	link
2024-12-23	Multimodal Preference Data Synthetic Alignment with Reward Model	Robert Wijaya et.al.	2412.17417	link
2024-12-23	MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models	Beibei Yu et.al.	2412.17339	null
2024-12-23	Neural-MCRL: Neural Multimodal Contrastive Representation Learning for EEG-based Visual Decoding	Yueyang Li et.al.	2412.17337	link
2024-12-23	Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective	Kaifang Long et.al.	2412.17297	null
2024-12-22	SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults	Jinzhi Wang et.al.	2412.17077	null
2024-12-22	CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models	Yeyuan Wang et.al.	2412.16869	link
2024-12-22	GME: Improving Universal Multimodal Retrieval by Multimodal LLMs	Xin Zhang et.al.	2412.16855	null
2024-12-21	AlzheimerRAG: Multimodal Retrieval Augmented Generation for PubMed articles	Aritra Kumar Lahiri et.al.	2412.16701	null
2024-12-20	MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection	Andrea Moglia et.al.	2412.15925	link
2024-12-20	Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution	Wentao Tan et.al.	2412.15650	link
2024-12-20	Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM	Yangyang Guo et.al.	2412.15614	null
2024-12-20	QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning	Xinyang Tong et.al.	2412.15576	null
2024-12-20	Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage	Saehyung Lee et.al.	2412.15484	null
2024-12-19	MRWeb: An Exploration of Generating Multi-Page Resource-Aware Web Code from UI Designs	Yuxuan Wan et.al.	2412.15310	link
2024-12-19	OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving	Shuo Xing et.al.	2412.15208	link
2024-12-19	Progressive Multimodal Reasoning via Active Retrieval	Guanting Dong et.al.	2412.14835	null
2024-12-19	Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models	Zijun Chen et.al.	2412.14660	link
2024-12-18	Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces	Jihan Yang et.al.	2412.14171	link
2024-12-18	InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models	Cong Wei et.al.	2412.14006	link
2024-12-18	LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer	Yipeng Zhang et.al.	2412.13871	link
2024-12-17	Modality-Inconsistent Continual Learning of Multimodal Large Language Models	Weiguo Pian et.al.	2412.13050	null
2024-12-17	ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing	Yaohui Ma et.al.	2412.12821	link
2024-12-17	PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model	Yuqing Wang et.al.	2412.12737	link
2024-12-17	ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding	Zhenxing Zhang et.al.	2412.12718	link
2024-12-17	Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation	Andong Chen et.al.	2412.12627	null
2024-12-17	FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning	Seunghee Kim et.al.	2412.12567	null
2024-12-17	Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models	Sina Bagheri Nezhad et.al.	2412.12500	link
2024-12-16	Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering	Jinhe Bi et.al.	2412.12359	link
2024-12-16	Instruction-based Image Manipulation by Watching How Things Move	Mingdeng Cao et.al.	2412.12087	null
2024-12-16	CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Guo Chen et.al.	2412.12075	null
2024-12-16	Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning	Yuti Liu et.al.	2412.11952	null
2024-12-16	A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges	Yibo Yan et.al.	2412.11936	null
2024-12-16	PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension	Kun Ouyang et.al.	2412.11906	null
2024-12-16	GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training	Renqiu Xia et.al.	2412.11863	link
2024-12-16	IDEA-Bench: How Far are Generative Models from Professional Designing?	Chen Liang et.al.	2412.11767	link
2024-12-16	From Specific-MLLM to Omni-MLLM: A Survey about the MLLMs alligned with Multi-Modality	Shixin Jiang et.al.	2412.11694	null
2024-12-16	ACE- $M^3$ : Automatic Capability Evaluator for Multimodal Medical Models	Xiechi Zhang et.al.	2412.11453	null
2024-12-15	Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal	Yuhao Wang et.al.	2412.11196	null
2024-12-13	Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining	Zhiqi Ge et.al.	2412.10342	null
2024-12-13	BrushEdit: All-In-One Image Inpainting and Editing	Yaowei Li et.al.	2412.10316	null
2024-12-13	Leveraging Multimodal Methods and Spontaneous Speech for Alzheimer’s Disease Identification	Yifan Gao et.al.	2412.09928	null
2024-12-12	ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation	Ali Athar et.al.	2412.09754	null
2024-12-12	EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM	Zhuofan Zong et.al.	2412.09618	null
2024-12-13	Olympus: A Universal Task Router for Computer Vision Tasks	Yuanze Lin et.al.	2412.09612	link
2024-12-12	SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding	Hao Li et.al.	2412.09604	null
2024-12-12	Do Multimodal Large Language Models See Like Humans?	Jiaying Lin et.al.	2412.09603	null
2024-12-12	InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions	Pan Zhang et.al.	2412.09596	link
2024-12-12	OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation	Jitesh Jain et.al.	2412.09585	link
2024-12-12	Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition	Zhisheng Zhong et.al.	2412.09501	link
2024-12-12	Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation	Baisen Wang et.al.	2412.09428	link
2024-12-12	Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine	Xiaoshuang Huang et.al.	2412.09278	link
2024-12-11	LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Information	Ke Wang et.al.	2412.08771	null
2024-12-11	From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons	Andrew Szot et.al.	2412.08442	null
2024-12-11	HyViLM: Enhancing Fine-Grained Recognition with a Hybrid Encoder for Vision-Language Models	Shiding Zhu et.al.	2412.08378	null
2024-12-11	M2SE: A Multistage Multitask Instruction Tuning Strategy for Unified Sentiment and Emotion Analysis	Ao Li et.al.	2412.08049	link
2024-12-10	DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation	Jianzong Wu et.al.	2412.07589	null
2024-12-09	SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations	Zhaorun Chen et.al.	2412.06878	null
2024-12-09	ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance	Chunwei Wang et.al.	2412.06673	null
2024-12-09	3D Spatial Understanding in MLLMs: Disambiguation and Evaluation	Chun-Peng Chang et.al.	2412.06613	null
2024-12-12	World knowledge-enhanced Reasoning Using Instruction-guided Interactor in Autonomous Driving	Mingliang Zhai et.al.	2412.06324	null
2024-12-09	LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations	Mingjie Xu et.al.	2412.06322	link
2024-12-09	Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness	Qifan Yu et.al.	2412.06293	null
2024-12-09	ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models	Bingchen Gong et.al.	2412.06292	null
2024-12-08	GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis	Ashish Goswami et.al.	2412.06089	null
2024-12-08	Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models	Xiao Xu et.al.	2412.05939	null
2024-12-08	Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models	Ma Teng et.al.	2412.05934	link
2024-12-08	[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs	Ao Wang et.al.	2412.05819	link
2024-12-06	Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Zhe Chen et.al.	2412.05271	link
2024-12-06	CompCap: Improving Multimodal Large Language Models with Composite Captions	Xiaohui Chen et.al.	2412.05243	null
2024-12-06	MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale	Jarvis Guo et.al.	2412.05237	null
2024-12-06	LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation	Donald Shenaj et.al.	2412.05148	link
2024-12-06	Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models	Zehao Wang et.al.	2412.04939	null
2024-12-06	EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation	Yongxin Wang et.al.	2412.04903	null
2024-12-06	Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis	Rui Zhou et.al.	2412.04707	null
2024-12-05	Assessing and Learning Alignment of Unimodal Vision and Language Models	Le Zhang et.al.	2412.04616	null
2024-12-05	p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay	Jun Zhang et.al.	2412.04449	link
2024-12-05	EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios	Lu Qiu et.al.	2412.04447	null
2024-12-05	GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration	Kaiyi Huang et.al.	2412.04440	null
2024-12-05	Grounding Descriptions in Images informs Zero-Shot Visual Recognition	Shaunak Halbe et.al.	2412.04429	link
2024-12-05	Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion	Jiuhai Chen et.al.	2412.04424	link
2024-12-05	Liquid: Language Models are Scalable Multi-modal Generators	Junfeng Wu et.al.	2412.04332	link
2024-12-05	FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression	Bo Tong et.al.	2412.04317	link
2024-12-04	VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding	Chaoyu Li et.al.	2412.03735	null
2024-12-04	DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation	Qingdong He et.al.	2412.03255	null
2024-12-04	Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges	Minghao Shao et.al.	2412.03220	null
2024-12-04	ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning	Zhe Xie et.al.	2412.03104	link
2024-12-03	AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?	Kaixiong Gong et.al.	2412.02611	null
2024-12-03	Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks	Jinjin Cai et.al.	2412.02531	null
2024-12-03	VR Based Emotion Recognition Using Deep Multimodal Fusion With Biosignals Across Multiple Anatomical Domains	Pubudu L. Indrasiri et.al.	2412.02283	null
2024-12-03	Personalized Multimodal Large Language Models: A Survey	Junda Wu et.al.	2412.02142	null
2024-12-03	WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image	Yuci Liang et.al.	2412.02141	null
2024-12-03	Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey	Yunkai Dang et.al.	2412.02104	null
2024-12-02	PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving	Xuewen Luo et.al.	2412.02025	null
2024-12-02	MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models	Xiaomin Li et.al.	2412.01343	null
2024-12-02	Enhancing Perception Capabilities of Multimodal LLMs with Training-free Fusion	Zhuokun Chen et.al.	2412.01289	null
2024-12-02	Ponder & Press: Advancing Visual GUI Agent towards General Computer Control	Yiqin Wang et.al.	2412.01268	null
2024-12-02	T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs	Shukang Yin et.al.	2411.19951	link
2024-11-29	VLSBench: Unveiling Visual Leakage in Multimodal Safety	Xuhao Hu et.al.	2411.19939	link
2024-11-29	On Domain-Specific Post-Training for Multimodal Large Language Models	Daixuan Cheng et.al.	2411.19930	null
2024-11-29	Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings	Qiong Wu et.al.	2411.19628	link
2024-11-28	Libra: Leveraging Temporal Images for Biomedical Radiology Analysis	Xi Zhang et.al.	2411.19378	link
2024-11-28	SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation	Yuhan Pei et.al.	2411.19182	null
2024-11-28	Detailed Object Description with Controllable Dimensions	Xinran Wang et.al.	2411.19106	link
2024-11-28	I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting	Nicola Fanelli et.al.	2411.19050	link
2024-11-28	DuetML: Human-LLM Collaborative Machine Learning Framework for Non-Expert Users	Wataru Kawabe et.al.	2411.18908	null
2024-11-27	Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment	Soumya Suvra Ghosal et.al.	2411.18688	null
2024-11-27	Cross-modal Information Flow in Multimodal Large Language Models	Zhi Zhang et.al.	2411.18620	link
2024-11-27	GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation	Pengfei Zhou et.al.	2411.18499	null
2024-11-27	ChatRex: Taming Multimodal LLM for Joint Perception and Understanding	Qing Jiang et.al.	2411.18363	link
2024-11-27	Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models	Jingming Liu et.al.	2411.18142	null
2024-11-26	NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?	Jiaxuan Li et.al.	2411.17794	null
2024-11-26	Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration	Yuhang Han et.al.	2411.17686	null
2024-11-26	What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics	Jordan J. Bird et.al.	2411.17593	null
2024-11-26	Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey	Jiayi Kuang et.al.	2411.17558	null
2024-11-26	InsightEdit: Towards Better Instruction Following for Image Editing	Yingjing Xu et.al.	2411.17323	null
2024-11-26	in-Car Biometrics (iCarB) Datasets for Driver Recognition: Face, Fingerprint, and Voice	Vedrana Krivokuca Hahn et.al.	2411.17305	null
2024-11-26	A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs	Lehan He et.al.	2411.17265	null
2024-11-26	HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator	Fan Yang et.al.	2411.17261	null
2024-11-26	Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment	Zheng Chen et.al.	2411.17237	link
2024-11-26	DOGE: Towards Versatile Visual Document Grounding and Referring	Yinan Zhou et.al.	2411.17125	null
2024-11-26	Multimodal Alignment and Fusion: A Survey	Songtao Li et.al.	2411.17040	null
2024-11-25	TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation	Linqing Zhong et.al.	2411.16425	null
2024-11-25	Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models	Hao Yi et.al.	2411.16201	null
2024-11-25	Interpreting Object-level Foundation Models via Visual Precision Search	Ruoyu Chen et.al.	2411.16198	link
2024-11-25	ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration	Haozhan Shen et.al.	2411.16044	link
2024-11-23	Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark	Rong-Cheng Tu et.al.	2411.15488	link
2024-11-23	Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy	Te Yang et.al.	2411.15453	null
2024-11-22	MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs	Chaoyou Fu et.al.	2411.15296	link
2024-11-22	VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement	Daeun Lee et.al.	2411.15115	null
2024-11-22	mR $^2$ AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA	Tao Zhang et.al.	2411.15041	null
2024-11-22	De-biased Multimodal Electrocardiogram Analysis	Haitao Li et.al.	2411.14795	null
2024-11-22	Evaluating and Advancing Multimodal Large Language Models in Ability Lens	Feng Chen et.al.	2411.14725	null
2024-11-22	FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data	Binqian Xu et.al.	2411.14717	link
2024-11-22	Any-to-3D Generation via Hybrid Diffusion Supervision	Yijun Fan et.al.	2411.14715	null
2024-11-21	LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval	Weiheng Lu et.al.	2411.14505	link
2024-11-21	Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models	Yuhao Dong et.al.	2411.14432	link
2024-11-21	Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding	Yiming Zhang et.al.	2411.14401	null
2024-11-21	Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance	Haozhe Zhao et.al.	2411.14279	null
2024-11-21	Separable Mixture of Low-Rank Adaptation for Continual Visual Instruction Tuning	Ziqi Wang et.al.	2411.13949	null
2024-11-21	Panther: Illuminate the Sight of Multimodal LLMs with Instruction-Guided Visual Prompts	Honglin Li et.al.	2411.13909	null
2024-11-20	Decompose and Leverage Preferences from Expert Models for Improving Trustworthiness of MLLMs	Rui Cao et.al.	2411.13697	link
2024-11-20	AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations	Gaurav Verma et.al.	2411.13451	null
2024-11-20	DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving	Xianda Guo et.al.	2411.13112	link
2024-11-20	Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving	Hao Zhou et.al.	2411.13076	null
2024-11-19	Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models	Zhen Zeng et.al.	2411.12790	null
2024-11-19	Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting	Haoyu Zhao et.al.	2411.12789	null
2024-11-19	Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning	Pengkun Jiao et.al.	2411.12787	null
2024-11-19	Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model	Yiming Shi et.al.	2411.12783	null
2024-11-18	Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning	Xudong Yan et.al.	2411.12584	link
2024-11-19	CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model	Dongyoung Go et.al.	2411.12287	null
2024-11-18	AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning	Kun Xiang et.al.	2411.11930	link
2024-11-18	Dissecting Misalignment of Multimodal Large Language Models via Influence Function	Lijie Hu et.al.	2411.11667	null
2024-11-18	MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models	Harshita Sharma et.al.	2411.11362	null
2024-11-18	CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset	Zhiming Wang et.al.	2411.11360	link
2024-11-18	MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis	Yingjie Zhou et.al.	2411.11235	null
2024-11-19	Multilingual Large Language Models: A Systematic Survey	Shaolin Zhu et.al.	2411.11072	link
2024-11-19	VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?	Yunlong Tang et.al.	2411.10979	null
2024-11-17	Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering	Zeping Yu et.al.	2411.10950	link
2024-11-17	Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning	Wenke Huang et.al.	2411.10928	null
2024-11-16	BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization	Md. Nazmus Sadat Samin et.al.	2411.10879	link
2024-11-16	Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts	Jinqiang Long et.al.	2411.10669	link
2024-11-15	Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization	Weiyun Wang et.al.	2411.10442	null
2024-11-15	Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization	Yuhan Fu et.al.	2411.10436	null
2024-11-15	Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting	Ziqi Xie et.al.	2411.10309	link
2024-11-15	Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning	Jingru Yang et.al.	2411.10252	null
2024-11-15	CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation	Xiaofei Zhu et.al.	2411.10060	null
2024-11-15	VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos	Weihao Zhong et.al.	2411.10032	null
2024-11-15	Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs	Xiaofeng Zhang et.al.	2411.09968	null
2024-11-14	MagicQuill: An Intelligent Interactive Image Editing System	Zichen Liu et.al.	2411.09703	link
2024-11-14	Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models	Wei Wang et.al.	2411.09691	null
2024-11-14	Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models	Chutian Meng et.al.	2411.09449	null
2024-11-14	Spider: Any-to-Many Multimodal LLM	Jinxiang Lai et.al.	2411.09439	link
2024-11-14	LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation	Zhenshi Li et.al.	2411.09301	link
2024-11-13	Multimodal Instruction Tuning with Hybrid State Space Models	Jianing Zhou et.al.	2411.08840	null
2024-11-13	Can MLLMs Guide Weakly-Supervised Temporal Action Localization Tasks?	Quan Zhang et.al.	2411.08466	null
2024-11-13	Material Property Prediction with Element Attribute Knowledge Graphs and Multimodal Representation Learning	Chao Huang et.al.	2411.08414	null
2024-11-12	SimBase: A Simple Baseline for Temporal Video Grounding	Peijun Bao et.al.	2411.07945	null
2024-11-12	Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding	Zirui Shao et.al.	2411.07722	null
2024-11-12	Zer0-Jack: A Memory-efficient Gradient-based Jailbreaking Method for Black-box Multi-modal Large Language Models	Tiejin Chen et.al.	2411.07559	null
2024-11-11	Multimodal Fusion Balancing Through Game-Theoretic Regularization	Konstantinos Kontras et.al.	2411.07335	null
2024-11-11	CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models	Junho Kim et.al.	2411.06869	null
2024-11-11	Learning from Feedback: Semantic Enhancement for Object SLAM Using Foundation Models	Jungseok Hong et.al.	2411.06752	null
2024-11-10	KMM: Key Frame Mask Mamba for Extended Motion Generation	Zeyu Zhang et.al.	2411.06481	link
2024-11-09	A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks	Chia Xin Liang et.al.	2411.06284	null
2024-11-09	An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models	Fatemeh Shiri et.al.	2411.06048	link
2024-11-08	Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation	Dong Shu et.al.	2411.05316	link
2024-11-08	Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding	Jaeyoo Park et.al.	2411.05254	null
2024-11-07	On Erroneous Agreements of CLIP Image Embeddings	Siting Li et.al.	2411.05195	null
2024-11-07	Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models	Pete Janowczyk et.al.	2411.05056	null
2024-11-07	CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM	Jingwei Xu et.al.	2411.04954	null
2024-11-07	GUI Agents with Foundation Models: A Comprehensive Survey	Shuai Wang et.al.	2411.04890	null
2024-11-07	Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs	Chengxin Hu et.al.	2411.04708	null
2024-11-06	Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education	Anand Syamkumar et.al.	2411.04308	null
2024-11-06	Analyzing Multimodal Features of Spontaneous Voice Assistant Commands for Mild Cognitive Impairment Detection	Nana Lin et.al.	2411.04158	null
2024-11-06	Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination	Dingjie Song et.al.	2411.03823	link
2024-11-06	StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding	Junming Lin et.al.	2411.03628	link
2024-11-05	MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning	Ziliang Gan et.al.	2411.03314	null
2024-11-05	Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?	Jingyu Xiao et.al.	2411.03292	link
2024-11-06	Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent	Yangning Li et.al.	2411.02937	link
2024-11-05	Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning	Mingcheng Li et.al.	2411.02793	null
2024-11-05	Multimodal Commonsense Knowledge Distillation for Visual Question Answering	Shuo Yang et.al.	2411.02722	null
2024-11-05	Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios	Yunkai Dang et.al.	2411.02708	null
2024-11-04	MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs	Sheng-Chieh Lin et.al.	2411.02571	null
2024-11-04	DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution	Yang Yue et.al.	2411.02359	link
2024-11-04	KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension	Jie Yang et.al.	2411.01846	null
2024-11-04	ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model	Yiming Sun et.al.	2411.01756	null
2024-11-03	UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models	Sejoon Oh et.al.	2411.01703	null
2024-11-03	Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation	Seongsu Ha et.al.	2411.01494	null
2024-11-02	Can Multimodal Large Language Model Think Analogically?	Diandian Guo et.al.	2411.01307	null
2024-11-02	Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems	Mikołaj Małkiński et.al.	2411.01173	null
2024-11-01	Exploring Multi-Modality Dynamics: Insights and Challenges in Multimodal Fusion for Biomedical Tasks	Laura Wenderoth et.al.	2411.00725	null
2024-11-01	Unified Generative and Discriminative Training for Multi-modal Large Language Models	Wei Chow et.al.	2411.00304	null
2024-10-31	JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment	Joao Sousa et.al.	2410.23988	null
2024-10-31	Leveraging LLMs for MT in Crisis Scenarios: a blueprint for low-resource languages	Séamus Lankford et.al.	2410.23890	null
2024-10-31	Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding	Jinlong He et.al.	2410.23822	null
2024-10-30	PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via Existing MLLM Structures	Tianxiang Wu et.al.	2410.23089	null
2024-10-29	Unsupervised Multimodal Fusion of In-process Sensor Data for Advanced Manufacturing Process Monitoring	Matthew McKinney et.al.	2410.22558	null
2024-10-29	Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench	Zheyuan Liu et.al.	2410.22108	link
2024-10-28	LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior	Hanyu Wang et.al.	2410.21264	null
2024-10-28	Face-MLLM: A Large Face Perception Model	Haomiao Sun et.al.	2410.20717	null
2024-10-27	Deep Learning-Driven Microstructure Characterization and Vickers Hardness Prediction of Mg-Gd Alloys	Lu Wang et.al.	2410.20402	null
2024-10-26	LLMs Can Evolve Continually on Modality for X-Modal Reasoning	Jiazuo Yu et.al.	2410.20178	link
2024-10-25	Evaluating Cost-Accuracy Trade-offs in Multimodal Search Relevance Judgements	Silvia Terragni et.al.	2410.19974	null
2024-10-25	Improving Multimodal Large Language Models Using Continual Learning	Shikhar Srivastava et.al.	2410.19925	null
2024-10-25	TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Xiangyu Zeng et.al.	2410.19702	null
2024-10-28	BIFRÖST: 3D-Aware Image compositing with Language Instructions	Lingxiao Li et.al.	2410.19079	link
2024-10-24	Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	Zhangheng Li et.al.	2410.18967	null
2024-10-24	SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models	Zonghao Ying et.al.	2410.18927	null
2024-10-24	Distill Visual Chart Reasoning Ability from LLMs to MLLMs	Wei He et.al.	2410.18798	link
2024-10-24	DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation	Yuang Ai et.al.	2410.18666	link
2024-10-25	Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks	Lehan Wang et.al.	2410.18387	null
2024-10-23	TP-Eval: Tap Multimodal LLMs’ Potential in Evaluation by Customizing Prompts	Yuxuan Xie et.al.	2410.18071	null
2024-10-23	CLEAR: Character Unlearning in Textual and Visual Modalities	Alexey Dontsov et.al.	2410.18057	null
2024-10-23	Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation	Wenfang Yao et.al.	2410.17918	link
2024-10-23	ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning	Zhiwei Hao et.al.	2410.17779	link
2024-10-23	YOLO-Vehicle-Pro: A Cloud-Edge Collaborative Framework for Object Detection in Autonomous Driving under Adverse Weather Conditions	Xiguang Li et.al.	2410.17734	null
2024-10-23	Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact	Junhua Liu et.al.	2410.17532	null
2024-10-22	LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding	Xiaoqian Shen et.al.	2410.17434	link
2024-10-22	Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models	Zhijie Tan et.al.	2410.16983	null
2024-10-22	IPL: Leveraging Multimodal Large Language Models for Intelligent Product Listing	Kang Chen et.al.	2410.16977	null
2024-10-22	Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance	Zhangwei Gao et.al.	2410.16261	link
2024-10-21	LLaVA-KD: A Framework of Distilling Multimodal Large Language Models	Yuxuan Cai et.al.	2410.16236	link
2024-10-21	Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining	Han Huang et.al.	2410.16166	link
2024-10-21	Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages	Xiang Yue et.al.	2410.16153	null
2024-10-21	Mitigating Object Hallucination via Concentric Causal Attention	Yun Xing et.al.	2410.15926	link
2024-10-21	AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection	Xiaoman Xu et.al.	2410.15591	link
2024-10-20	Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation	Jiayu Xiong et.al.	2410.15475	null
2024-10-20	Modality-Fair Preference Optimization for Trustworthy MLLM Alignment	Songtao Jiang et.al.	2410.15334	null
2024-10-19	SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation	Jingxuan Chen et.al.	2410.15164	link
2024-10-19	LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound	Xuechen Guo et.al.	2410.15074	null
2024-10-18	MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps	Xiongtao Zhou et.al.	2410.14668	link
2024-10-18	MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems	Zifeng Zhu et.al.	2410.14179	link
2024-10-18	RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training	Muhe Ding et.al.	2410.14154	null
2024-10-17	PUMA: Empowering Unified MLLM with Multi-granular Visual Generation	Rongyao Fang et.al.	2410.13861	link
2024-10-17	$γ-$ MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models	Yaxin Luo et.al.	2410.13859	null
2024-10-17	Can MLLMs Understand the Deep Implication Behind Chinese Images?	Chenhao Zhang et.al.	2410.13854	link
2024-10-18	Harnessing Webpage UIs for Text-Rich Visual Understanding	Junpeng Liu et.al.	2410.13824	null
2024-10-17	MobA: A Two-Level Agent System for Efficient Mobile Task Automation	Zichen Zhu et.al.	2410.13757	link
2024-10-17	Exploring the Design Space of Visual Context Representation in Video MLLMs	Yifan Du et.al.	2410.13694	link
2024-10-17	Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant	Haoran Hao et.al.	2410.13360	link
2024-10-16	MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs	Yunqiu Xu et.al.	2410.12332	null
2024-10-16	Understanding the Role of LLMs in Multimodal Evaluation Benchmarks	Botian Jiang et.al.	2410.12329	link
2024-10-16	Multimodal Fusion with Relational Learning for Molecular Property Prediction	Zhengyang Zhou et.al.	2410.12128	null
2024-10-15	MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding	Yue Cao et.al.	2410.11829	link
2024-10-15	MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation	Chenxi Wang et.al.	2410.11779	link
2024-10-15	SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding	Ying Chen et.al.	2410.11761	null
2024-10-15	Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions	Yuhan Fu et.al.	2410.11701	null
2024-10-15	VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI	Sijie Cheng et.al.	2410.11623	null
2024-10-15	MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark	Bin Shan et.al.	2410.11538	link
2024-10-15	Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs	Sihang Zhao et.al.	2410.11437	link
2024-10-15	Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models	Zhongye Liu et.al.	2410.11242	link
2024-10-15	MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation	Xianping Ma et.al.	2410.11160	link
2024-10-14	Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes	Tim Broedermann et.al.	2410.10791	link
2024-10-14	MMCFND: Multimodal Multilingual Caption-aware Fake News Detection for Low-resource Indic Languages	Shubhi Bansal et.al.	2410.10407	link
2024-10-14	Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation	Shun Qian et.al.	2410.10319	null
2024-10-14	ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization	Jiawei Li et.al.	2410.10238	null
2024-10-14	Tracing Human Stress from Physiological Signals using UWB Radar	Jia Xu et.al.	2410.10155	null
2024-10-15	LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models	Han Qiu et.al.	2410.09962	link
2024-10-13	Improving Colorectal Cancer Screening and Risk Assessment through Predictive Modeling on Medical Images and Records	Shuai Jiang et.al.	2410.09880	null
2024-10-13	Text4Seg: Reimagining Image Segmentation as Text Generation	Mengcheng Lan et.al.	2410.09855	link
2024-10-12	Skipping Computations in Multimodal LLMs	Mustafa Shukor et.al.	2410.09454	link
2024-10-12	MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection	Xi Jiang et.al.	2410.09453	link
2024-10-11	Multi-modal Fusion based Q-distribution Prediction for Controlled Nuclear Fusion	Shiao Wang et.al.	2410.08879	null
2024-10-11	Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking	Wei Zhang et.al.	2410.08616	null
2024-10-11	Baichuan-Omni Technical Report	Yadong Li et.al.	2410.08565	link
2024-10-11	SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models	Haotian Xia et.al.	2410.08474	link
2024-10-10	Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training	Gen Luo et.al.	2410.08202	null
2024-10-10	Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models	Qingni Wang et.al.	2410.08174	null
2024-10-10	Agent S: An Open Agentic Framework that Uses Computers Like a Human	Saaket Agashe et.al.	2410.08164	link
2024-10-10	Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs	Xiaoyuan Liu et.al.	2410.08145	link
2024-10-09	Retrieval Replace Reduction: An effective visual token reduction method via semantic match	Yingen Liu et.al.	2410.07278	null
2024-10-09	Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis	Bohan Zeng et.al.	2410.07155	link
2024-10-09	Personalized Visual Instruction Tuning	Renjie Pi et.al.	2410.07113	link
2024-10-10	Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology	Xiangyu Wang et.al.	2410.07087	null
2024-10-09	HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding	Keliang Li et.al.	2410.06777	null
2024-10-09	To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models	Junyan Lin et.al.	2410.06765	link
2024-10-09	ING-VP: MLLMs cannot Play Easy Vision-based Games Yet	Haoran Zhang et.al.	2410.06555	link
2024-10-09	Gumbel Rao Monte Carlo based Bi-Modal Neural Architecture Search for Audio-Visual Deepfake Detection	Aravinda Reddy PN et.al.	2410.06543	null
2024-10-08	Multimodal Situational Safety	Kaiwen Zhou et.al.	2410.06172	null
2024-10-08	Quadratic Is Not What You Need For Multimodal Large Language Models	Phu Pham et.al.	2410.06169	link
2024-10-08	$\textit{X}^2$-DFD: A framework for e${X}$plainable and e${X}$ tendable Deepfake Detection	Yize Chen et.al.	2410.06126	null
2024-10-07	Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents	Boyu Gou et.al.	2410.05243	link
2024-10-07	Organizing Unstructured Image Collections using Natural Language	Mingxuan Liu et.al.	2410.05217	null
2024-10-07	Multimodal Fusion Strategies for Mapping Biophysical Landscape Features	Lucia Gordon et.al.	2410.04833	link
2024-10-07	MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models	Kaichen Huang et.al.	2410.04819	link
2024-10-07	Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality	Guanyu Zhou et.al.	2410.04780	link
2024-10-07	MM-R $^3$ : On (In-)Consistency of Multi-modal Large Language Models (MLLMs)	Shih-Han Chou et.al.	2410.04778	null
2024-10-07	Diffusion Models in 3D Vision: A Survey	Zhen Wang et.al.	2410.04738	null
2024-10-07	ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models	Ziyue Wang et.al.	2410.04659	link
2024-10-08	FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering	Siqiao Xue et.al.	2410.04526	link
2024-10-06	MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration	Lai Wei et.al.	2410.04521	link
2024-10-04	Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models	Xin Zou et.al.	2410.03577	link
2024-10-04	Gradient-based Jailbreak Images for Multimodal Fusion Models	Javier Rando et.al.	2410.03489	link
2024-10-04	MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents	Junpeng Yue et.al.	2410.03450	null
2024-10-04	SELU: Self-Learning Embodied MLLMs in Unknown Environments	Boyu Li et.al.	2410.03303	null
2024-10-03	Contrastive Localized Language-Image Pre-Training	Hong-You Chen et.al.	2410.02746	null
2024-10-03	LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model	Duy M. H. Nguyen et.al.	2410.02615	null
2024-10-03	Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment	Kai Liu et.al.	2410.02505	link
2024-10-04	SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack	Zihao Pan et.al.	2410.02240	link
2024-10-04	From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities	Wanpeng Zhang et.al.	2410.02155	null
2024-10-02	Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations	Minoh Jeong et.al.	2410.02086	null
2024-10-02	EMMA: Efficient Visual Alignment in Multi-Modal LLMs	Sara Ghazanfari et.al.	2410.02080	link
2024-10-03	Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks	Mengzhao Jia et.al.	2410.01744	link
2024-10-02	Visual Perception in Text Strings	Qi Jia et.al.	2410.01733	link
2024-10-02	The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs	Hong Li et.al.	2410.01417	null
2024-10-02	SHAP-CAT: A interpretable multi-modal framework enhancing WSI classification via virtual staining and shapley-value-based multimodal fusion	Jun Wang et.al.	2410.01408	null
2024-10-01	FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks	Peiran Wu et.al.	2410.01089	null
2024-10-01	Deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data	Ivica Dimitrovski et.al.	2410.00469	null
2024-10-01	Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations	Miyu Goko et.al.	2410.00436	null
2024-10-01	MERIT: Multimodal Wearable Vital Sign Waveform Monitoring	Yongyang Tang et.al.	2410.00392	null
2024-09-30	Multimodal Alignment of Histopathological Images Using Cell Segmentation and Point Set Matching for Integrative Cancer Analysis	Jun Jiang et.al.	2410.00152	null
2024-09-30	MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	Haotian Zhang et.al.	2409.20566	null
2024-09-30	UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models	Qiaojun Yu et.al.	2409.20551	null
2024-09-30	Melody Is All You Need For Music Generation	Shaopeng Wei et.al.	2409.20196	link
2024-09-30	VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection	Huilin Deng et.al.	2409.20146	null
2024-09-30	Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval	Yabing Wang et.al.	2409.19961	link
2024-09-30	WildFusion: Multimodal Implicit 3D Reconstructions in the Wild	Yanbaihui Liu et.al.	2409.19904	null
2024-10-01	Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration	Kaihang Pan et.al.	2409.19872	link
2024-09-29	Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs	Fengzhu Zeng et.al.	2409.19656	null
2024-09-28	A Parameter-Efficient Tuning Framework for Language-guided Object Grounding and Robot Grasping	Houjian Yu et.al.	2409.19457	null
2024-09-28	Visual Question Decomposition on Multimodal Large Language Models	Haowei Zhang et.al.	2409.19339	null
2024-09-27	Enhancing Explainability in Multimodal Large Language Models Using Ontological Context	Jihen Amara et.al.	2409.18753	null
2024-09-27	3DPX: Single Panoramic X-ray Analysis Guided by 3D Oral Structure Reconstruction	Xiaoshuang Li et.al.	2409.18701	null
2024-09-27	Image-guided topic modeling for interpretable privacy classification	Alina Elena Baia et.al.	2409.18674	link
2024-09-27	When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation	Yuli Zhou et.al.	2409.18653	link
2024-09-27	Align $^2$ LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation	Hongzhe Huang et.al.	2409.18541	link
2024-09-27	FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation	Yuki Imajuku et.al.	2409.18459	null
2024-09-26	Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing	Huthaifa I. Ashqar et.al.	2409.18286	null
2024-09-26	EAGLE: Egocentric AGgregated Language-video Engine	Jing Bi et.al.	2409.17523	null
2024-09-26	Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE	Xun Zhu et.al.	2409.17508	link
2024-09-25	Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents	Junting Lu et.al.	2409.17140	null
2024-09-25	Pruning Multilingual Large Language Models for Multilingual Inference	Hwichan Kim et.al.	2409.16911	link
2024-09-25	MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features	Katharina Anderer et.al.	2409.16765	link
2024-09-26	EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models	Jiacheng Zhang et.al.	2409.16723	null
2024-09-25	EventHallusion: Diagnosing Event Hallucinations in Video LLMs	Jiacheng Zhang et.al.	2409.16597	link
2024-09-24	DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection	Jiaxin Ye et.al.	2409.15936	link
2024-09-25	M^2PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning	Taowen Wang et.al.	2409.15657	link
2024-09-23	MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models	Mohammad Shahab Sepehri et.al.	2409.15477	link
2024-09-24	OmniBench: Towards The Future of Universal Omni-Language Models	Yizhi Li et.al.	2409.15272	link
2024-09-23	Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation	Manu Gaur et.al.	2409.15125	null
2024-09-23	Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond	Hong Chen et.al.	2409.14993	null
2024-09-23	FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension	Junzhuo Liu et.al.	2409.14750	link
2024-09-24	Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding	Yan Shu et.al.	2409.14485	link
2024-09-21	Enhancing Advanced Visual Reasoning Ability of Large Language Models	Zhiyuan Li et.al.	2409.13980	null
2024-09-20	MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension	Ting Liu et.al.	2409.13609	link
2024-09-18	Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference	Najmeh Forouzandehmehr et.al.	2409.12150	null
2024-09-18	Fusion in Context: A Multimodal Approach to Affective State Recognition	Youssef Mohamed et.al.	2409.11906	null
2024-09-18	Bridging Design and Development with Automated Declarative UI Code Generation	Ting Zhou et.al.	2409.11667	null
2024-09-17	Towards Time Series Reasoning with LLMs	Winnie Chow et.al.	2409.11376	null
2024-09-17	CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration	Jiahui Gao et.al.	2409.11365	null
2024-09-17	Multimodal Attention-Enhanced Feature Fusion-based Weekly Supervised Anomaly Violence Detection	Yuta Kaneko et.al.	2409.11223	null
2024-09-16	Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving	Yunsheng Ma et.al.	2409.11182	null
2024-09-17	Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs	Dingjie Song et.al.	2409.10994	link
2024-09-17	Multi-Floor Zero-Shot Object Navigation Policy	Lingfeng Zhang et.al.	2409.10906	null
2024-09-16	XLM for Autonomous Driving Systems: A Comprehensive Review	Sonda Fourati et.al.	2409.10484	null
2024-09-16	Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models	Weihao Ye et.al.	2409.10197	link
2024-09-15	Explore the Hallucination on Low-level Perception for MLLMs	Yinan Sun et.al.	2409.09748	null
2024-09-15	AutoJournaling: A Context-Aware Journaling System Leveraging MLLMs on Smartphone Screenshots	Tianyi Zhang et.al.	2409.09696	null
2024-09-14	Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM	Yuanjie Lyu et.al.	2409.09362	null
2024-09-14	ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models	Yahan Tu et.al.	2409.09318	null
2024-09-13	Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation	Cheng Charles Ma et.al.	2409.09135	null
2024-09-11	Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU	Zhenyu Ning et.al.	2409.09086	null
2024-09-13	VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation	Hanning Chen et.al.	2409.08464	link
2024-09-11	Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering	Weixi Weng et.al.	2409.07331	null
2024-09-11	Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout	Anbin QI et.al.	2409.07078	null
2024-09-10	LIME-M: Less Is More for Evaluation of MLLMs	Kang Zhu et.al.	2409.06851	link
2024-09-10	VoiceWukong: Benchmarking Deepfake Voice Detection	Ziwei Yan et.al.	2409.06348	null
2024-09-10	MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding	Surbhi Madan et.al.	2409.06224	null
2024-09-09	MLLM-FL: Multimodal Large Language Model Assisted Federated Learning on Heterogeneous and Long-tailed Data	Jianyi Zhang et.al.	2409.06067	null
2024-09-09	Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models	Hongyang Lei et.al.	2409.05929	link
2024-09-09	Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments	Haritheja Etukuru et.al.	2409.05865	link
2024-09-15	MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct	Run Luo et.al.	2409.05840	null
2024-09-11	A Survey of Multimodal Composite Editing and Retrieval	Suyan Li et.al.	2409.05405	link
2024-09-07	Training-free ZS-CIR via Weighted Modality Fusion and Similarity	Ren-Di Wu et.al.	2409.04918	link
2024-09-06	Influence of Early through Late Fusion on Pancreas Segmentation from Imperfectly Registered Multimodal MRI	Lucas W. Remedios et.al.	2409.04563	link
2024-09-10	Question-Answering Dense Video Events	Hangyu Qin et.al.	2409.04388	link
2024-09-09	Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver	Zeren Zhang et.al.	2409.04214	link
2024-09-06	UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity	Yicheng Fu et.al.	2409.04081	null
2024-09-09	mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding	Anwen Hu et.al.	2409.03420	link
2024-09-05	ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding	Zhengzhuo Xu et.al.	2409.03277	null
2024-09-05	OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving	Julong Wei et.al.	2409.03272	null
2024-09-05	TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations	Mingze Gao et.al.	2409.03206	null
2024-09-04	No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning	Manu Gaur et.al.	2409.03025	null
2024-09-06	HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts	Xinyu Liu et.al.	2409.02919	link
2024-09-04	LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture	Xidong Wang et.al.	2409.02889	link
2024-09-04	A Medical Multimodal Large Language Model for Pediatric Pneumonia	Weiwei Tian et.al.	2409.02608	null
2024-09-02	Understanding Multimodal Hallucination with Parameter-Free Representation Alignment	Yueqian Wang et.al.	2409.01151	link
2024-09-01	Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model	Fuqiang Niu et.al.	2409.00597	null
2024-08-31	StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models	Yuxiang Guo et.al.	2409.00304	null
2024-08-30	EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs	Zhen Fan et.al.	2408.17168	null
2024-08-30	AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding	Yonghui Wang et.al.	2408.16986	link
2024-08-29	Law of Vision Representation in MLLMs	Shijia Yang et.al.	2408.16357	link
2024-08-28	Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders	Min Shi et.al.	2408.15998	link
2024-08-28	LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation	Fangxun Shu et.al.	2408.15881	link
2024-08-28	A Survey on Evaluation of Multimodal Large Language Models	Jiaxing Huang et.al.	2408.15769	null
2024-08-28	MambaPlace:Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba Mechanisms	Tianyi Shang et.al.	2408.15740	link
2024-08-28	TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning	Jinglun Li et.al.	2408.15566	link
2024-08-28	Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models	Wenbin Wang et.al.	2408.15556	link
2024-08-27	Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation	Jian Hu et.al.	2408.15205	link
2024-08-27	GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer Based Fusion Network for Multimodal Sentiment Analysis	Yijie Jin et.al.	2408.14809	link
2024-08-26	Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos	Qirui Chen et.al.	2408.14469	null
2024-08-26	Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos	Jiajun Fei et.al.	2408.14023	link
2024-08-26	FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation	Daixun Li et.al.	2408.13980	null
2024-08-25	ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models	Yeji Park et.al.	2408.13906	link
2024-08-23	MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?	Yi-Fan Zhang et.al.	2408.13257	null
2024-08-23	ParGo: Bridging Vision-Language with Partial and Global Views	An-Lan Wang et.al.	2408.12928	link
2024-08-23	IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities	Bin Wang et.al.	2408.12902	link
2024-08-23	Semantic Alignment for Multimodal Large Language Models	Tao Wu et.al.	2408.12867	null
2024-08-22	Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models	Jean Park et.al.	2408.12763	null
2024-08-23	Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese	Khang T. Doan et.al.	2408.12480	null
2024-08-26	MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model	Chaoya Jiang et.al.	2408.12321	null
2024-08-21	CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion	Yunlong Tang et.al.	2408.12009	null
2024-08-21	SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs	Yuanyang Yin et.al.	2408.11813	null
2024-08-21	EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model	Feipeng Ma et.al.	2408.11795	null
2024-08-21	EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning	Bohao Xing et.al.	2408.11424	link
2024-08-21	EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning	Zhihao Li et.al.	2408.11397	null
2024-08-22	Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model	Mengying Ge et.al.	2408.11286	null
2024-08-20	FLAME: Learning to Navigate with Multimodal LLM in Urban Environments	Yunzhe Xu et.al.	2408.11051	link
2024-08-19	CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving	Hidehisa Arai et.al.	2408.10845	null
2024-08-20	PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection	Tri Cao et.al.	2408.10738	null
2024-08-21	SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition	Zebang Cheng et.al.	2408.10500	link
2024-08-19	FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant	Zhengchao Huang et.al.	2408.10072	link
2024-08-19	Enhance Modality Robustness in Text-Centric Multimodal Alignment with Adversarial Prompting	Yun-Da Tsai et.al.	2408.09798	null
2024-08-20	Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation	Yuyang Ye et.al.	2408.09698	link
2024-08-18	Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models	Kening Zheng et.al.	2408.09429	link
2024-08-17	BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger	Yulin Chen et.al.	2408.09093	null
2024-08-16	ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis	Yubao Zhao et.al.	2408.08849	link
2024-08-16	Rethinking Generative Semantic Communication for Multi-User Systems with Multi-Modal LLM	Wanting Yang et.al.	2408.08765	null
2024-08-16	Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm	Hongcheng Liu et.al.	2408.08693	link
2024-08-16	Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning	Wenwen Zhuang et.al.	2408.08640	link
2024-08-16	A Survey on Benchmarks of Multimodal Large Language Models	Jian Li et.al.	2408.08632	link
2024-08-16	CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving	Shihan Peng et.al.	2408.08500	null
2024-08-15	When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding	Pingping Zhang et.al.	2408.08093	null
2024-08-14	End-to-end Semantic-centric Video-based Multimodal Affective Computing	Ronghao Lin et.al.	2408.07694	null
2024-08-15	Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities	Enneng Yang et.al.	2408.07666	link
2024-08-15	MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark	Minxuan Zhou et.al.	2408.07543	link
2024-08-14	LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image	Fan Yang et.al.	2408.07422	null
2024-08-14	Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion	Peiyuan Chen et.al.	2408.07303	null
2024-08-13	CROME: Cross-Modal Adapters for Efficient Multimodal LLM	Sayna Ebrahimi et.al.	2408.06610	null
2024-08-13	Social Debiasing for Fair Multi-modal LLMs	Harry Cheng et.al.	2408.06569	null
2024-08-12	Deep Multimodal Collaborative Learning for Polyp Re-Identification	Suncheng Xiang et.al.	2408.05914	link
2024-08-11	Advancing Re-Ranking with Multimodal Fusion and Target-Oriented Auxiliary Tasks in E-Commerce Search	Enqiang Xu et.al.	2408.05751	null
2024-08-11	A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot	Haoxuan Ding et.al.	2408.05729	link
2024-08-13	SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning	Yuze Zhao et.al.	2408.05517	link
2024-08-10	How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model	Yuxin Zhu et.al.	2408.05411	link
2024-08-09	Revisiting Multi-Modal LLM Evaluation	Jian Lu et.al.	2408.05334	null
2024-08-09	Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing	Jiarui Xie et.al.	2408.05307	null
2024-08-09	VITA: Towards Open-Source Interactive Omni Multimodal LLM	Chaoyou Fu et.al.	2408.05211	link
2024-08-09	Instruction Tuning-free Visual Token Complement for Multimodal LLMs	Dongsheng Wang et.al.	2408.05019	null
2024-08-13	mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models	Jiabo Ye et.al.	2408.04840	link
2024-08-09	Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models	Qirui Jiao et.al.	2408.04594	link
2024-08-08	MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models	Haoxuan Li et.al.	2408.04388	link
2024-08-08	MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning	Rex Liu et.al.	2408.04243	null
2024-08-08	M2EF-NNs: Multimodal Multi-instance Evidence Fusion Neural Networks for Cancer Survival Prediction	Hui Luo et.al.	2408.04170	null
2024-08-07	Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks	Zaijing Li et.al.	2408.03615	link
2024-08-07	Unlocking the Non-Native Language Context Limitation: Native Language Prompting Facilitates Knowledge Elicitation	Baixuan Li et.al.	2408.03544	link
2024-08-07	Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation	Weiqi Feng et.al.	2408.03505	null
2024-08-06	Targeted Visual Prompting for Medical Visual Question Answering	Sergio Tascon-Morales et.al.	2408.03043	link
2024-08-05	Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions	Xinbei Ma et.al.	2408.02544	link
2024-08-05	UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model	Zhaowei Li et.al.	2408.02503	link
2024-08-06	Infusing Environmental Captions for Long-Form Video Language Grounding	Hyogun Lee et.al.	2408.02336	null
2024-08-05	REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models	Agneet Chatterjee et.al.	2408.02231	null
2024-08-04	Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping	Mingxin Huang et.al.	2408.02034	link
2024-08-03	MiniCPM-V: A GPT-4V Level MLLM on Your Phone	Yuan Yao et.al.	2408.01800	link
2024-08-03	MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition	Ruoyu Wang et.al.	2408.01766	null
2024-08-02	Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs	Yilun Hua et.al.	2408.01417	null
2024-08-05	Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs	Peng Ding et.al.	2408.01355	link
2024-08-02	A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks	Jiaqi Wang et.al.	2408.01319	null
2024-08-02	Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal Large Language Models	Kohou Wang et.al.	2408.01003	null
2024-08-02	Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation	Zijian Yi et.al.	2408.00970	link
2024-08-01	Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model	Benlin Liu et.al.	2408.00754	null
2024-08-01	Are Bigger Encoders Always Better in Vision Large Models?	Bozhou Li et.al.	2408.00620	null
2024-08-01	Multimodal Fusion and Coherence Modeling for Video Topic Segmentation	Hai Yu et.al.	2408.00365	null
2024-08-01	Towards Flexible Evaluation for Generative Visual Question Answering	Huishan Ji et.al.	2408.00300	link
2024-08-01	Multi-Modal Parameter-Efficient Fine-tuning via Graph Neural Network	Bin Cheng et.al.	2408.00290	null
2024-07-31	ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models	Mingrui Wu et.al.	2407.21534	link
2024-07-31	MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training	Zhanpeng Chen et.al.	2407.21439	link
2024-07-31	Design and Development of Laughter Recognition System Based on Multimodal Fusion and Deep Learning	Fuzheng Zhao et.al.	2407.21391	null
2024-07-31	Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM	Can Wang et.al.	2407.21333	null
2024-07-30	Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate	Zheng Lin et.al.	2407.20505	link
2024-07-29	CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models	Junda Wu et.al.	2407.20454	null
2024-07-29	Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning	Xingchen Zeng et.al.	2407.20174	link
2024-07-29	Diffusion Feedback Helps CLIP See Better	Wenxuan Wang et.al.	2407.20171	link
2024-07-29	ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2	Wenjun Huang et.al.	2407.19832	null
2024-07-29	Multimodal Large Language Models for Bioimage Analysis	Shanghang Zhang et.al.	2407.19778	null
2024-07-29	Revolutionizing Urban Safety Perception Assessments: Integrating Multimodal Large Language Models with Street View Images	Jiaxin Zhanga et.al.	2407.19719	null
2024-07-29	Harnessing Large Vision and Language Models in Agriculture: A Review	Hongyan Zhu et.al.	2407.19679	null
2024-07-29	ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck	Chia-Hao Kao et.al.	2407.19651	null
2024-07-28	ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding	Zhen Chen et.al.	2407.19435	link
2024-07-28	LLAVADI: What Matters For Multimodal Large Language Models Distillation	Shilin Xu et.al.	2407.19409	null
2024-07-27	Data Processing Techniques for Modern Multimodal Models	Yinheng Li et.al.	2407.19180	null
2024-07-26	Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment	Yuze Zheng et.al.	2407.18854	null
2024-07-26	Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models	Xiang Shi et.al.	2407.18626	link
2024-07-25	Automated Ensemble Multimodal Machine Learning for Healthcare	Fergus Imrie et.al.	2407.18227	null
2024-07-26	Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic	Fakhraddin Alwajih et.al.	2407.18129	null
2024-07-25	ERIT Lightweight Multimodal Dataset for Elderly Emotion Recognition and Multimodal Fusion Evaluation	Rita Frieske et.al.	2407.17772	null
2024-07-24	DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation	Qian Feng et.al.	2407.17348	null
2024-07-23	CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs	Jihyung Kil et.al.	2407.16837	link
2024-07-23	Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation	Tao Meng et.al.	2407.16714	null
2024-07-23	PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects	Junyi Li et.al.	2407.16696	link
2024-07-24	MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues	Liyun Zhang et.al.	2407.16552	null
2024-07-23	Harmonizing Visual Text Comprehension and Generation	Zhen Zhao et.al.	2407.16364	link
2024-07-23	INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model	Yiwei Ma et.al.	2407.16198	link
2024-07-23	UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models	Liu Qi et.al.	2407.16160	link
2024-07-22	Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight	Ziyuan Huang et.al.	2407.15819	null
2024-07-22	GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI	Zhaojie Fang et.al.	2407.15719	link
2024-07-22	Addressing Out-of-Distribution Challenges in Image Semantic Communication Systems with Multi-modal Large Language Models	Feifan Zhang et.al.	2407.15335	null
2024-07-21	MIBench: Evaluating Multimodal Large Language Models over Multiple Images	Haowei Liu et.al.	2407.15272	null
2024-07-23	BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM	Hanjun Luo et.al.	2407.15240	link
2024-07-23	DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer	Jinfeng Wei et.al.	2407.15130	null
2024-07-21	Navigation Instruction Generation with BEV Perception and Large Language Models	Sheng Fan et.al.	2407.15087	link
2024-07-19	On Pre-training of Multimodal Language Models Customized for Chart Understanding	Wan-Cyuan Fan et.al.	2407.14506	null
2024-07-19	T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation	Kaiyue Sun et.al.	2407.14505	link
2024-07-19	Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding	Renshan Zhang et.al.	2407.14439	link
2024-07-19	Not All Attention is Needed: Parameter and Computation Efficient Tuning for Multi-modal Large Language Models via Effective Attention Skipping	Qiong Wu et.al.	2407.14093	null
2024-07-18	X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs	Sirnam Swetha et.al.	2407.13851	null
2024-07-20	EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension	Wei Zhang et.al.	2407.13596	link
2024-07-18	OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird’s-eye-view Vehicle Semantic Segmentation	Jian Sun et.al.	2407.13137	null
2024-07-17	MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models	Leyang Shen et.al.	2407.12709	link
2024-07-17	E5-V: Universal Embeddings with Multimodal Large Language Models	Ting Jiang et.al.	2407.12580	link
2024-07-17	Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning	Mustafa Dogan et.al.	2407.12498	null
2024-07-17	ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data	Yufan Shen et.al.	2407.12358	link
2024-07-16	UrbanWorld: An Urban World Model for 3D City Generation	Yu Shang et.al.	2407.11965	link
2024-07-17	Harnessing Large Language Models for Multimodal Product Bundling	Xiaohao Liu et.al.	2407.11712	link
2024-07-15	By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting	Hyungjun Yoon et.al.	2407.10385	link
2024-07-13	Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding	Ruihuang Li et.al.	2407.09781	null
2024-07-12	SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers	Shraman Pramanick et.al.	2407.09413	link
2024-07-17	Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study	Yulong Yang et.al.	2407.09295	null

Prompt

Publish Date	Title	Authors	PDF	Code
2025-06-26	SAM4D: Segment Anything in Camera and LiDAR Streams	Jianyun Xu et.al.	2506.21547	null
2025-06-26	Assessing an evolutionary search engine for small language models, prompts, and evaluation metrics	Cláudio Lúcio do Val Lopes et.al.	2506.21512	null
2025-06-26	Mitigating Hallucination of Large Vision-Language Models via Dynamic Logits Calibration	Jiahe Chen et.al.	2506.21509	null
2025-06-26	Aligning Spoken Dialogue Models from User Interactions	Anne Wu et.al.	2506.21463	null
2025-06-26	ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing	Huadai Liu et.al.	2506.21448	null
2025-06-26	Controllable 3D Placement of Objects with Scene-Aware Diffusion Models	Mohamed Omran et.al.	2506.21446	null
2025-06-26	Text2Cypher Across Languages: Evaluating Foundational Models Beyond English	Makbule Gulcin Ozsoy et.al.	2506.21445	null
2025-06-26	Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation	Sweta Banerjee et.al.	2506.21444	null
2025-06-26	Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection	Ali Şenol et.al.	2506.21443	null
2025-06-26	Evolution and determinants of firm-level systemic risk in local production networks	Anna Mancini et.al.	2506.21426	null
2025-06-25	MMSearch-R1: Incentivizing LMMs to Search	Jinming Wu et.al.	2506.20670	null
2025-06-25	EditP23: 3D Editing via Propagation of Image Prompts to Multi-View	Roi Bar-On et.al.	2506.20652	null
2025-06-25	Memento: Note-Taking for Your Future Self	Chao Wan et.al.	2506.20642	null
2025-06-25	PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models	Soufiane Hayou et.al.	2506.20629	null
2025-06-25	Video Perception Models for 3D Scene Synthesis	Rui Huang et.al.	2506.20601	null
2025-06-25	Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges	Alexander D. Kalian et.al.	2506.20598	null
2025-06-25	When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs	Ammar Khairi et.al.	2506.20544	null
2025-06-25	{\tt RapidGBM}: An Efficient Tool for Fermi-GBM Visibility Checking and Data Analysis with a Case Study of EP240617a	Yun Wang et.al.	2506.20532	null
2025-06-25	Case-based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios	Wenbin Gan et.al.	2506.20531	null
2025-06-25	Probing AI Safety with Source Code	Ujwal Narayan et.al.	2506.20471	null
2025-06-24	Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation	Xingyang Li et.al.	2506.19852	null
2025-06-24	AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models	Zehuan Huang et.al.	2506.19851	null
2025-06-24	Evaluating Compliance with Visualization Guidelines in Diagrams for Scientific Publications Using Large Vision Language Models	Johannes Rückert et.al.	2506.19825	null
2025-06-24	Persona Features Control Emergent Misalignment	Miles Wang et.al.	2506.19823	null
2025-06-24	CoCo4D: Comprehensive and Complex 4D Scene Generation	Junwei Zhou et.al.	2506.19798	null
2025-06-24	Noncontextual Pauli Hamiltonians	Alexis Ralli et.al.	2506.19778	null
2025-06-24	Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study	Nandana Mihindukulasooriya et.al.	2506.19773	null
2025-06-24	Arabic Dialect Classification using RNNs, Transformers, and Large Language Models: A Comparative Analysis	Omar A. Essameldin et.al.	2506.19753	null
2025-06-24	Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales	Seyedmorteza Sadat et.al.	2506.19713	null
2025-06-24	UltraAD: Fine-Grained Ultrasound Anomaly Classification via Few-Shot CLIP Adaptation	Yue Zhou et.al.	2506.19694	null
2025-06-23	State updates and useful qubits in relativistic quantum information	José Polo-Gómez et.al.	2506.18906	null
2025-06-24	jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval	Michael Günther et.al.	2506.18902	null
2025-06-23	Steering Conceptual Bias via Transformer Latent-Subspace Activation	Vansh Sharma et.al.	2506.18887	null
2025-06-23	Amplifying Machine Learning Attacks Through Strategic Compositions	Yugeng Liu et.al.	2506.18870	null
2025-06-23	OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation	Qijun Gan et.al.	2506.18866	null
2025-06-23	TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting	Zhongbin Guo et.al.	2506.18862	null
2025-06-23	Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset	Zhuowei Chen et.al.	2506.18851	null
2025-06-23	Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories	Islem Bouzenia et.al.	2506.18824	null
2025-06-24	ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation	Siao Tang et.al.	2506.18810	link
2025-06-24	PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications	Pietro Bonazzi et.al.	2506.18807	null
2025-06-23	Emergent Temporal Correspondences from Video Diffusion Transformers	Jisu Nam et.al.	2506.17220	link
2025-06-20	Co-Seg++: Mutual Prompt-Guided Collaborative Learning for Versatile Medical Segmentation	Qing Xu et.al.	2506.17159	null
2025-06-20	Do We Need Large VLMs for Spotting Soccer Actions?	Ritabrata Chakraborty et.al.	2506.17144	null
2025-06-23	Multi-label Scene Classification for Autonomous Vehicles: Acquiring and Accumulating Knowledge from Diverse Datasets	Ke Li et.al.	2506.17101	null
2025-06-23	Better Language Model Inversion by Compactly Representing Next-Token Distributions	Murtaza Nazir et.al.	2506.17090	null
2025-06-20	Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation	Jiahao Cheng et.al.	2506.17088	null
2025-06-20	Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025	Dominik Macháček et.al.	2506.17077	null
2025-06-20	Relaxed syntax modeling in Transformers for future-proof license plate recognition	Florent Meyer et.al.	2506.17051	null
2025-06-20	The Hidden Cost of an Image: Quantifying the Energy Consumption of AI Image Generation	Giulia Bertazzini et.al.	2506.17016	null
2025-06-20	LLM-Generated Feedback Supports Learning If Learners Choose to Use It	Danielle R. Thomas et.al.	2506.17006	null
2025-06-18	Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model	Anirud Aggarwal et.al.	2506.15682	link
2025-06-18	Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers	Tommaso Green et.al.	2506.15674	link
2025-06-18	PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection	Wenhao Li et.al.	2506.15656	null
2025-06-18	Demystifying the Visual Quality Paradox in Multimodal Large Language Models	Shuo Xing et.al.	2506.15645	null
2025-06-18	FindingDory: A Benchmark to Evaluate Memory in Embodied Agents	Karmesh Yadav et.al.	2506.15635	null
2025-06-18	Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability	Yusuke Sakai et.al.	2506.15629	null
2025-06-18	HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization	Roey Ron et.al.	2506.15625	null
2025-06-18	The Effect of State Representation on LLM Agent Behavior in Dynamic Routing Games	Lyle Goodyear et.al.	2506.15624	null
2025-06-18	The Compositional Architecture of Regret in Large Language Models	Xiangxiang Cui et.al.	2506.15617	null
2025-06-20	One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution	Yujing Sun et.al.	2506.15591	link
2025-06-17	Markov Regime-Switching Intelligent Driver Model for Interpretable Car-Following Behavior	Chengyuan Zhang et.al.	2506.14762	null
2025-06-17	Cost-Aware Routing for Efficient Text-To-Image Generation	Qinchan et.al.	2506.14753	null
2025-06-18	Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs	Ling Team et.al.	2506.14731	null
2025-06-17	Procedural Knowledge Libraries: Towards Executable (Research) Memory	Hamidah Oderinwale et.al.	2506.14715	null
2025-06-17	DiFuse-Net: RGB and Dual-Pixel Depth Estimation using Window Bi-directional Parallax Attention and Cross-modal Transfer Learning	Kunal Swami et.al.	2506.14709	null
2025-06-17	Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers	Daniel D’souza et.al.	2506.14702	null
2025-06-17	FocalClick-XL: Towards Unified and High-quality Interactive Segmentation	Xi Chen et.al.	2506.14686	null
2025-06-17	AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models	Ads Dawson et.al.	2506.14682	link
2025-06-17	StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery	Jina Kim et.al.	2506.14670	null
2025-06-17	GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors	Hengyuan Zhang et.al.	2506.14646	null
2025-06-16	Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins	Chuanruo Ning et.al.	2506.13761	null
2025-06-16	ExtendAttack: Attacking Servers of LRMs via Extending Reasoning	Zhenhao Zhu et.al.	2506.13737	link
2025-06-16	Instruction Following by Boosting Attention of Large Language Models	Vitoria Guardieiro et.al.	2506.13734	null
2025-06-16	BanditWare: A Contextual Bandit-based Framework for Hardware Prediction	Tainã Coleman et.al.	2506.13730	null
2025-06-16	Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models	Arjun Krishna et.al.	2506.13726	null
2025-06-16	TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning	Junru Zhang et.al.	2506.13705	null
2025-06-16	Value-Free Policy Optimization via Reward Partitioning	Bilal Faye et.al.	2506.13702	null
2025-06-17	Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention	Haonan Wang et.al.	2506.13674	null
2025-06-16	Assessing the Limits of In-Context Learning beyond Functions using Partially Ordered Relation	Debanjan Dutta et.al.	2506.13608	null
2025-06-16	Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching	Weimin Bai et.al.	2506.13594	null
2025-06-13	EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction	Hsi-Che Lin et.al.	2506.12015	null
2025-06-13	code_transformed: The Influence of Large Language Models on Code	Yuliang Xu et.al.	2506.12014	null
2025-06-13	Simple Radiology VLLM Test-time Scaling with Thought Graph Traversal	Yue Yao et.al.	2506.11989	link
2025-06-13	Scalable Generalized Bayesian Online Neural Network Training for Sequential Decision Making	Gerardo Duran-Martin et.al.	2506.11898	null
2025-06-13	LLM-based Dynamic Differential Testing for Database Connectors with Reinforcement Learning-Guided Prompt Selection	Ce Lyu et.al.	2506.11870	null
2025-06-13	TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks	Qihai Zhang et.al.	2506.11844	null
2025-06-13	Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution	Zhangkai Ni et.al.	2506.11823	link
2025-06-13	Rethinking Multilingual Vision-Language Translation: Dataset, Evaluation, and Adaptation	Xintong Wang et.al.	2506.11820	null
2025-06-13	On the Performance of LLMs for Real Estate Appraisal	Margot Geerts et.al.	2506.11812	null
2025-06-13	Abstract Sound Fusion with Unconditioned Inversion Model	Jing Liu et.al.	2506.11811	null
2025-06-12	GenWorld: Towards Detecting AI-generated Real-world Simulation Videos	Weiliang Chen et.al.	2506.10975	null
2025-06-13	MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning	Yuxuan Luo et.al.	2506.10963	null
2025-06-12	Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods	Zhaiming Shen et.al.	2506.10959	null
2025-06-12	Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors	Chen Yueh-Han et.al.	2506.10949	link
2025-06-12	Execution Guided Line-by-Line Code Generation	Boaz Lavon et.al.	2506.10948	link
2025-06-12	VINCIE: Unlocking In-context Image Editing from Video	Leigang Qu et.al.	2506.10941	null
2025-06-12	Robustly Improving LLM Fairness in Realistic Settings via Interpretability	Adam Karvonen et.al.	2506.10922	link
2025-06-12	Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?	Fei Lin et.al.	2506.10912	null
2025-06-12	(De)composing Craft: An Elementary Grammar for Sharing Expertise in Craft Workflows	Ritik Batra et.al.	2506.10891	null
2025-06-12	CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design Generation	Zhao Zhang et.al.	2506.10890	link
2025-06-11	Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling	Tim Z. Xiao et.al.	2506.09998	null
2025-06-11	From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring	Yang Li et.al.	2506.09996	null
2025-06-11	Text-Aware Image Restoration with Diffusion Models	Jaewon Min et.al.	2506.09993	null
2025-06-11	Large Language Models for Toxic Language Detection in Low-Resource Balkan Languages	Amel Muminovic et.al.	2506.09992	link
2025-06-11	Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation	Xinyu Yang et.al.	2506.09991	null
2025-06-11	Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs	Hiroshi Matsuda et.al.	2506.09983	link
2025-06-11	LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge	Sahar Abdelnabi et.al.	2506.09956	link
2025-06-11	Assessing a Safety Case: Bottom-up Guidance for Claims and Evidence Evaluation	Scott Schnelle et.al.	2506.09929	null
2025-06-11	Aspect-Based Opinion Summarization with Argumentation Schemes	Wendi Zhou et.al.	2506.09917	null
2025-06-11	The Emergence of Abstract Thought in Large Language Models Beyond Any Language	Yuxin Chen et.al.	2506.09890	null
2025-06-10	Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations	Yuxin Dong et.al.	2506.09048	null
2025-06-10	Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation	Xiaowen Ma et.al.	2506.09046	null
2025-06-10	MagCache: Fast Video Generation with Magnitude-Aware Cache	Zehong Ma et.al.	2506.09045	link
2025-06-10	AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions	Polina Kirichenko et.al.	2506.09038	link
2025-06-10	Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features	Hakyung Sung et.al.	2506.09021	null
2025-06-10	SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning	Ruiqi Zhang et.al.	2506.09016	link
2025-06-10	Boosting Rust Unit Test Coverage through Hybrid Program Analysis and Large Language Models	Bei Chu et.al.	2506.09002	null
2025-06-10	Do Concept Replacement Techniques Really Erase Unacceptable Concepts?	Anudeep Das et.al.	2506.08991	null
2025-06-10	ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations	Amirreza Rouhi et.al.	2506.08968	null
2025-06-10	WIP: Large Language Model-Enhanced Smart Tutor for Undergraduate Circuit Analysis	Liangliang Chen et.al.	2506.08962	null
2025-06-09	Hidden in plain sight: VLMs overlook their visual representations	Stephanie Fu et.al.	2506.08008	null
2025-06-09	PairEdit: Learning Semantic Variations for Exemplar-based Image Editing	Haoguang Lu et.al.	2506.07992	link
2025-06-09	Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers	Zhengyao Lv et.al.	2506.07986	link
2025-06-10	OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation	Jingjing Chang et.al.	2506.07977	link
2025-06-10	Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction	Junhong Shen et.al.	2506.07976	link
2025-06-09	Reinforcing Multimodal Understanding and Generation with Dual Self-rewards	Jixiang Hong et.al.	2506.07963	null
2025-06-09	TokenBreak: Bypassing Text Classification Models Through Token Manipulation	Kasimir Schulz et.al.	2506.07948	null
2025-06-09	ProtocolLLM: RTL Benchmark for SystemVerilog Generation of Communication Protocols	Arnav Sheth et.al.	2506.07945	link
2025-06-09	Adversarial Attack Classification and Robustness Testing for Large Language Models for Code	Yang Liu et.al.	2506.07942	null
2025-06-09	Quantum Graph Transformer for NLP Sentiment Classification	Shamminuj Aktar et.al.	2506.07937	null
2025-06-06	PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time	Weizhi Zhang et.al.	2506.06254	null
2025-06-06	Lightweight Prompt Biasing for Contextualized End-to-End ASR Systems	Bo Ren et.al.	2506.06252	null
2025-06-06	Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism Detection	Sahrish Khan et.al.	2506.06238	null
2025-06-06	Detecting Voice Phishing with Precision: Fine-Tuning Small Language Models	Ju Yong Sim et.al.	2506.06180	null
2025-06-06	Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach	James Ford et.al.	2506.06175	null
2025-06-06	semantic-features: A User-Friendly Tool for Studying Contextual Word Embeddings in Interpretable Semantic Spaces	Jwalanthi Ranganathan et.al.	2506.06169	null
2025-06-06	Stream DaQ: Stream-First Data Quality Monitoring	Vasileios Papastergios et.al.	2506.06147	link
2025-06-06	CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval	David Wan et.al.	2506.06144	null
2025-06-06	Let’s CONFER: A Dataset for Evaluating Natural Language Inference Models on CONditional InFERence and Presupposition	Tara Azin et.al.	2506.06133	null
2025-06-06	Bridging the Gap: In-Context Learning for Modeling Human Disagreement	Benedetta Muscato et.al.	2506.06113	null
2025-06-05	VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos	Hanoona Rasheed et.al.	2506.05349	null
2025-06-05	ContentV: Efficient Training of Video Generation Models with Limited Compute	Wenfeng Lin et.al.	2506.05343	null
2025-06-05	Refer to Anything with Vision-Language Prompts	Shengcao Cao et.al.	2506.05342	null
2025-06-05	VideoMolmo: Spatio-Temporal Grounding Meets Pointing	Ghazi Shazan Ahmad et.al.	2506.05336	link
2025-06-05	ProRefine: Inference-time Prompt Refinement with Textual Feedback	Deepak Pandita et.al.	2506.05305	null
2025-06-05	Power Law Guided Dynamic Sifting for Efficient Attention	Nirav Koley et.al.	2506.05300	null
2025-06-05	Stable Vision Concept Transformers for Medical Diagnosis	Lijie Hu et.al.	2506.05286	null
2025-06-05	Video World Models with Long-term Spatial Memory	Tong Wu et.al.	2506.05284	null
2025-06-05	From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos	Animesh Gupta et.al.	2506.05274	link
2025-06-06	Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning	Violet Xiang et.al.	2506.05256	null
2025-06-04	LayerFlow: A Unified Model for Layer-aware Video Generation	Sihui Ji et.al.	2506.04228	null
2025-06-04	Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models	Fangrui Zhu et.al.	2506.04220	null
2025-06-04	Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models	Soumya Suvra Ghosal et.al.	2506.04210	null
2025-06-04	TracLLM: A Generic Framework for Attributing Long Context LLMs	Yanting Wang et.al.	2506.04202	link
2025-06-04	Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models	Ruiqi Zhang et.al.	2506.04182	null
2025-06-04	SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling	Anhao Zhao et.al.	2506.04179	null
2025-06-04	Does Prompt Design Impact Quality of Data Imputation by LLMs?	Shreenidhi Srinivasan et.al.	2506.04172	null
2025-06-04	VISCA: Inferring Component Abstractions for Automated End-to-End Testing	Parsa Alian et.al.	2506.04161	null
2025-06-04	A Dataset for Addressing Patient’s Information Needs related to Clinical Course of Hospitalization	Sarvesh Soni et.al.	2506.04156	null
2025-06-04	Are Lexicon-Based Tools Still the Gold Standard for Valence Analysis in Low-Resource Flemish?	Ratna Kandala et.al.	2506.04139	null
2025-06-03	IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation	Yuanze Lin et.al.	2506.03150	null
2025-06-03	AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation	Prashanth Vijayaraghavan et.al.	2506.03122	null
2025-06-03	Zero-Shot Tree Detection and Segmentation from Aerial Forest Imagery	Michelle Chen et.al.	2506.03114	link
2025-06-03	From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit	Valérie Costa et.al.	2506.03093	null
2025-06-03	EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models	Mingzhe Li et.al.	2506.03067	null
2025-06-03	Leveraging Information Retrieval to Enhance Spoken Language Understanding Prompts in Few-Shot Learning	Pierre Lepagnol et.al.	2506.03035	null
2025-06-03	Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style-Rich Representation	Yongqi Wang et.al.	2506.02997	null
2025-06-03	Linear Spatial World Models Emerge in Large Language Models	Matthieu Tehenan et.al.	2506.02996	null
2025-06-03	Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation	Li Zhang et.al.	2506.02992	null
2025-06-03	Performance of leading large language models in May 2025 in Membership of the Royal College of General Practitioners-style examination questions: a cross-sectional analysis	Richard Armitage et.al.	2506.02987	null
2025-05-30	ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL	Yu Zhang et.al.	2505.24875	null
2025-05-30	The Road to Generalizable Neuro-Symbolic Learning Should be Paved with Foundation Models	Adam Stein et.al.	2505.24874	link
2025-05-30	GenSpace: Benchmarking Spatially-Aware Image Generation	Zehan Wang et.al.	2505.24870	null
2025-05-30	Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization	Joschka Braun et.al.	2505.24859	null
2025-05-30	MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs	Gabrielle Kaili-May Liu et.al.	2505.24858	link
2025-05-30	Reading Recognition in the Wild	Charig Yang et.al.	2505.24848	null
2025-05-30	PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models	Yinggan Xu et.al.	2505.24823	null
2025-05-30	CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning	Jiangpeng He et.al.	2505.24816	link
2025-06-02	Guiding Generative Storytelling with Knowledge Graphs	Zhijun Pan et.al.	2505.24803	null
2025-05-30	Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation	Yucheng Zhou et.al.	2505.24787	link
2025-05-29	LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers	Yusuf Dalva et.al.	2505.23758	null
2025-05-29	MAGREF: Masked Guidance for Any-Reference Video Generation	Yufan Deng et.al.	2505.23742	link
2025-05-29	How Animals Dance (When You’re Not Looking)	Xiaojuan Wang et.al.	2505.23738	null
2025-05-29	EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast	Shreeram Suresh Chandra et.al.	2505.23732	null
2025-05-29	SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA	Minrui Luo et.al.	2505.23724	null
2025-05-29	ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering	Zexi Liu et.al.	2505.23723	link
2025-05-29	Label-Guided In-Context Learning for Named Entity Recognition	Fan Bai et.al.	2505.23722	link
2025-05-29	COBRA: Contextual Bandit Algorithm for Ensuring Truthful Strategic Agents	Arun Verma et.al.	2505.23720	null
2025-05-29	TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning	Andreas Auer et.al.	2505.23719	link
2025-05-29	Don’t Take the Premise for Granted: Evaluating the Premise Critique Ability of Large Language Models	Jinzhe Li et.al.	2505.23715	link
2025-05-28	GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning	Qingchen Yu et.al.	2505.22661	null
2025-05-28	Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese	Hanjia Lyu et.al.	2505.22645	link
2025-05-28	Understanding (Un)Reliability of Steering Vectors in Language Models	Joschka Braun et.al.	2505.22637	null
2025-05-28	Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs	Ziling Cheng et.al.	2505.22630	null
2025-05-28	RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction	Yuchi Wang et.al.	2505.22613	null
2025-05-28	Fusion Steering: Prompt-Specific Activation Control	Waldemar Chang et.al.	2505.22572	null
2025-05-29	A black hole in a near-pristine galaxy 700 million years after the Big Bang	Roberto Maiolino et.al.	2505.22567	null
2025-05-28	Universal Visuo-Tactile Video Understanding for Embodied Interaction	Yifan Xie et.al.	2505.22566	null
2025-05-28	PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models	Junwen Chen et.al.	2505.22523	null
2025-05-28	Multi-MLLM Knowledge Distillation for Out-of-Context News Detection	Yimeng Gu et.al.	2505.22517	null
2025-05-27	Be Decisive: Noise-Induced Layouts for Multi-Subject Generation	Omer Dahary et.al.	2505.21488	null
2025-05-27	Policy Optimized Text-to-Image Pipeline Design	Uri Gadot et.al.	2505.21478	null
2025-05-27	Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion	Zhanqiu Hu et.al.	2505.21467	null
2025-05-27	Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance	Shintaro Ozaki et.al.	2505.21458	null
2025-05-27	Visual Product Graph: Bridging Visual Products And Composite Images For End-to-End Style Recommendations	Yue Li Du et.al.	2505.21454	null
2025-05-27	Policy Induction: Predicting Startup Success via Explainable Memory-Augmented In-Context Learning	Xianling Mu et.al.	2505.21427	null
2025-05-27	A Physics-Augmented GraphGPS Framework for the Reconstruction of 3D Riemann Problems from Sparse Data	Rami Cassia et.al.	2505.21421	link
2025-05-27	DecisionFlow: Advancing Large Language Model as Principled Decision Maker	Xiusi Chen et.al.	2505.21397	null
2025-05-27	Square $χ$PO: Differentially Private and Robust $χ^2$ -Preference Optimization in Offline Direct Alignment	Xingyu Zhou et.al.	2505.21395	null
2025-05-27	Leveraging the Power of Conversations: Optimal Key Term Selection in Conversational Contextual Bandits	Maoli Liu et.al.	2505.21393	null
2025-05-26	Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models	Weihao Xuan et.al.	2505.20236	null
2025-05-26	The Mirage of Multimodality: Where Truth is Tested and Honesty Unravels	Jiaming Ji et.al.	2505.20214	null
2025-05-26	Parameter-Efficient Fine-Tuning with Column Space Projection	Junseo Hwang et.al.	2505.20211	null
2025-05-26	Temporal Sampling for Forgotten Reasoning in LLMs	Yuetai Li et.al.	2505.20196	link
2025-05-26	Visual Abstract Thinking Empowers Multimodal Reasoning	Dairu Liu et.al.	2505.20164	link
2025-05-26	Capability-Based Scaling Laws for LLM Red-Teaming	Alexander Panfilov et.al.	2505.20162	link
2025-05-26	UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models	Xueyan Zhang et.al.	2505.20154	null
2025-05-26	StructEval: Benchmarking LLMs’ Capabilities to Generate Structural Outputs	Jialin Yang et.al.	2505.20139	null
2025-05-26	Agentic 3D Scene Generation with Spatially Contextualized VLMs	Xinhang Liu et.al.	2505.20129	null
2025-05-26	Agentic AI Process Observability: Discovering Behavioral Variability	Fabiana Fournier et.al.	2505.20127	null
2025-05-26	OB3D: A New Dataset for Benchmarking Omnidirectional 3D Reconstruction Using Blender	Shintaro Ito et.al.	2505.20126	link
2025-05-26	TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent	Dominik Meier et.al.	2505.20118	link
2025-05-26	Language-Agnostic Suicidal Risk Detection Using Large Language Models	June-Woo Kim et.al.	2505.20109	null
2025-05-26	Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning	Ziyi Zhang et.al.	2505.20107	link
2025-05-26	Adaptive Deep Reasoning: Triggering Deep Thinking When Needed	Yunhao Wang et.al.	2505.20101	null
2025-05-26	Transformer in Protein: A Survey	Xiaowen Ling et.al.	2505.20098	null
2025-05-26	S2LPP: Small-to-Large Prompt Prediction across LLMs	Liang Cheng et.al.	2505.20097	null
2025-05-23	REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders	Savya Khosla et.al.	2505.18153	link
2025-05-23	Frankentext: Stitching random text fragments into long-form narratives	Chau Minh Pham et.al.	2505.18128	link
2025-05-23	Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion	Jacob Hansen et.al.	2505.18115	null
2025-05-23	Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL	Joey Hong et.al.	2505.18098	null
2025-05-23	Towards more transferable adversarial attack in black-box manner	Chun Tong Lei et.al.	2505.18097	null
2025-05-23	Assessing the performance of 8 AI chatbots in bibliographic reference retrieval: Grok and DeepSeek outperform ChatGPT, but none are fully accurate	Álvaro Cabezas-Clavijo et.al.	2505.18059	null
2025-05-23	FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation	Zherui Zhang et.al.	2505.18053	null
2025-05-23	Structured Thinking Matters: Improving LLMs Generalization in Causal Inference Tasks	Wentao Sun et.al.	2505.18034	null
2025-05-23	LLM assisted web application functional requirements generation: A case study of four popular LLMs over a Mess Management System	Rashmi Gupta et.al.	2505.18019	null
2025-05-23	Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation	Zhihua Liu et.al.	2505.17994	null
2025-05-22	GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning	Chengqi Duan et.al.	2505.17022	link
2025-05-22	Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework	Chenhao Zhang et.al.	2505.17019	link
2025-05-22	When Are Concepts Erased From Diffusion Models?	Kevin Lu et.al.	2505.17013	link
2025-05-22	Understanding Prompt Tuning and In-Context Learning via Meta-Learning	Tim Genewein et.al.	2505.17010	link
2025-05-22	Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding	Runpeng Yu et.al.	2505.16990	link
2025-05-22	Know the Ropes: A Heuristic Strategy for LLM-based Multi-Agent System Design	Zhenkun Li et.al.	2505.16979	null
2025-05-22	Creatively Upscaling Images with Global-Regional Priors	Yurui Qian et.al.	2505.16976	null
2025-05-22	OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning	Zongyan Han et.al.	2505.16974	link
2025-05-23	VeriFastScore: Speeding up long-form factuality evaluation	Rishanth Rajendhran et.al.	2505.16973	link
2025-05-22	Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval	Nandan Thakur et.al.	2505.16967	null
2025-05-20	UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation	Rui Tian et.al.	2505.14682	null
2025-05-20	Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training	Mengru Wang et.al.	2505.14681	null
2025-05-20	Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning	Jiaer Xia et.al.	2505.14677	null
2025-05-20	UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens	Ruichuan An et.al.	2505.14671	link
2025-05-20	SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment	Wonje Jeung et.al.	2505.14667	null
2025-05-20	Beyond Words: Multimodal LLM Knows When to Speak	Zikai Liao et.al.	2505.14654	null
2025-05-20	VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation	Wentao Ma et.al.	2505.14640	null
2025-05-20	Think Only When You Need with Large Hybrid-Reasoning Models	Lingjie Jiang et.al.	2505.14631	null
2025-05-20	Enhancing Learned Knowledge in LoRA Adapters Through Efficient Contrastive Decoding on Ascend NPUs	Morgan Lindsay Heisler et.al.	2505.14620	null
2025-05-20	Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models	Sahar Abdelnabi et.al.	2505.14617	link
2025-05-19	CIE: Controlling Language Model Text Generations Using Continuous Signals	Vinay Samuel et.al.	2505.13448	link
2025-05-19	Make Still Further Progress: Chain of Thoughts for Tabular Data Leaderboard	Si-Yang Liu et.al.	2505.13421	null
2025-05-19	AdaptThink: Reasoning Models Can Learn When to Think	Jiajie Zhang et.al.	2505.13417	link
2025-05-19	MR. Judge: Multimodal Reasoner as a Judge	Renjie Pi et.al.	2505.13403	null
2025-05-19	R3: Robust Rubric-Agnostic Reward Models	David Anugraha et.al.	2505.13388	link
2025-05-19	How Adding Metacognitive Requirements in Support of AI Feedback in Practice Exams Transforms Student Learning Behaviors	Mak Ahmad et.al.	2505.13381	null
2025-05-19	What Prompts Don’t Say: Understanding and Managing Underspecification in LLM Prompts	Chenyang Yang et.al.	2505.13360	link
2025-05-19	Multi-Armed Bandits Meet Large Language Models	Djallel Bouneffouf et.al.	2505.13355	null
2025-05-19	A large-scale analysis of public-facing, community-built chatbots on Character.AI	Owen Lee et.al.	2505.13354	null
2025-05-19	Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks	Narek Maloyan et.al.	2505.13348	null
2025-05-16	CRISP: Clustering Multi-Vector Representations for Denoising and Pruning	João Veneroso et.al.	2505.11471	null
2025-05-16	ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks	Zhixiong Zhuang et.al.	2505.11459	null
2025-05-16	HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation	Shaina Raza et.al.	2505.11454	link
2025-05-16	GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art	Chenkai Zhang et.al.	2505.11436	link
2025-05-16	When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs	Xiaomin Li et.al.	2505.11423	null
2025-05-16	CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs	Sijia Chen et.al.	2505.11413	null
2025-05-19	Phare: A Safety Probe for Large Language Models	Pierre Le Jeune et.al.	2505.11365	link
2025-05-16	LegoSLM: Connecting LLM with Speech Encoder using CTC Posteriors	Rao Ma et.al.	2505.11352	null
2025-05-16	XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision	Nuo Chen et.al.	2505.11336	null
2025-05-16	CROC: Evaluating and Training T2I Metrics with Pseudo- and Human-Labeled Contrastive Robustness Checks	Christoph Leiter et.al.	2505.11314	null
2025-05-15	T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback	Zehan Wang et.al.	2505.10561	null
2025-05-15	Style Customization of Text-to-Vector Generation with Image Diffusion Priors	Peiying Zhang et.al.	2505.10558	null
2025-05-15	Beyond ‘Aha!’: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models	Zhiyuan Hu et.al.	2505.10554	link
2025-05-15	Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data	Yiwen Liu et.al.	2505.10551	link
2025-05-15	Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning	Milan Ganai et.al.	2505.10547	null
2025-05-15	Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models	Annie Wong et.al.	2505.10543	link
2025-05-16	WeGA: Weakly-Supervised Global-Local Affinity Learning Framework for Lymph Node Metastasis Prediction in Rectal Cancer	Yifan Gao et.al.	2505.10502	null
2025-05-15	Batched Nonparametric Bandits via k-Nearest Neighbor UCB	Sakshi Arya et.al.	2505.10498	null
2025-05-15	AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenge	Ranjan Sapkota et.al.	2505.10468	null
2025-05-15	A possible periodic RM evolution in the repeating FRB 20220529	Yi-Fang Liang et.al.	2505.10463	null
2025-05-14	Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?	Anthony GX-Chen et.al.	2505.09614	null
2025-05-14	Adversarial Suffix Filtering: a Defense Pipeline for LLMs	David Khachaturov et.al.	2505.09602	null
2025-05-14	How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference	Nidhal Jegham et.al.	2505.09598	null
2025-05-14	WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models	Abdullah Mushtaq et.al.	2505.09595	null
2025-05-14	BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset	Jiuhai Chen et.al.	2505.09568	link
2025-05-14	PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning	Zongqian Li et.al.	2505.09519	link
2025-05-14	Layered Unlearning for Adversarial Relearning	Timothy Qian et.al.	2505.09500	link
2025-05-14	Card Sorting Simulator: Augmenting Design of Logical Information Architectures with Large Language Models	Eduard Kuric et.al.	2505.09478	null
2025-05-14	A 2D Semantic-Aware Position Encoding for Vision Transformers	Xi Chen et.al.	2505.09466	null
2025-05-14	Beyond Pixels: Leveraging the Language of Soccer to Improve Spatio-Temporal Action Detection in Broadcast Videos	Jeremie Ochin et.al.	2505.09455	null
2025-05-13	UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations	Hanjung Kim et.al.	2505.08787	null
2025-05-13	Radio observations point to a moderately relativistic outflow in the fast X-ray transient EP241021a	Muskan Yadav et.al.	2505.08781	null
2025-05-14	Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology	Yatai Ji et.al.	2505.08765	null
2025-05-13	NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context	Ben Yao et.al.	2505.08734	null
2025-05-13	LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs	K M Sajjadul Islam et.al.	2505.08704	null
2025-05-14	Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities	George Saon et.al.	2505.08699	null
2025-05-13	VizCV: AI-assisted visualization of researchers’ publications tracks	Vladimír Lazárik et.al.	2505.08691	null
2025-05-13	A Mamba-based Network for Semi-supervised Singing Melody Extraction Using Confidence Binary Regularization	Xiaoliang He et.al.	2505.08681	link
2025-05-13	Enhancing Software Development with Context-Aware Conversational Agents: A User Study on Developer Interactions with Chatbots	Glaucia Melo et.al.	2505.08648	null
2025-05-13	Cracking the relation between mass and 1P-star fraction of globular clusters: III. Initial distributions of in-situ and ex-situ clusters	Geneviève Parmentier et.al.	2505.08626	null
2025-05-12	A Comparative Analysis of Static Word Embeddings for Hungarian	Máté Gedeon et.al.	2505.07809	link
2025-05-12	Domain Regeneration: How well do LLMs match syntactic properties of text domains?	Da Ju et.al.	2505.07784	null
2025-05-12	Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding	Yifeng Di et.al.	2505.07768	link
2025-05-12	“I Apologize For Not Understanding Your Policy”: Exploring the Specification and Evaluation of User-Managed Access Control Policies by AI Virtual Assistants	Jennifer Mondragon et.al.	2505.07759	null
2025-05-12	Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets	Weiyu Li et.al.	2505.07747	null
2025-05-13	VTutor for High-Impact Tutoring at Scale: Managing Engagement and Real-Time Multi-Screen Monitoring with P2P Connections	Eason Chen et.al.	2505.07736	null
2025-05-12	Gameplay Highlights Generation	Vignesh Edithal et.al.	2505.07721	null
2025-05-13	Codifying Character Logic in Role-Playing	Letian Peng et.al.	2505.07705	link
2025-05-13	OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit	Arun S. Maiya et.al.	2505.07672	link
2025-05-12	ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models	Ozgur Kara et.al.	2505.07652	null
2025-05-09	Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks	Christos Plachouras et.al.	2505.06224	link
2025-05-09	Adapting a Segmentation Foundation Model for Medical Image Classification	Pengfei Gu et.al.	2505.06217	null
2025-05-09	Turbo-ICL: In-Context Learning-Based Turbo Equalization	Zihang Song et.al.	2505.06175	null
2025-05-09	Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study	Faeze Ghorbanpour et.al.	2505.06149	null
2025-05-09	BrainSegDMlF: A Dynamic Fusion-enhanced SAM for Brain Lesion Segmentation	Hongming Wang et.al.	2505.06133	null
2025-05-09	Context Informed Incremental Learning Improves Myoelectric Control Performance in Virtual Reality Object Manipulation Tasks	Gabriel Gagné et.al.	2505.06064	link
2025-05-09	Towards Better Cephalometric Landmark Detection with Diffusion Data Generation	Dongqian Guo et.al.	2505.06055	null
2025-05-09	Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification	Leon Eshuijs et.al.	2505.06032	link
2025-05-09	ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding	Shuai Wang et.al.	2505.06020	null
2025-05-09	CAPE: Context-Aware Prompt Perturbation Mechanism with Differential Privacy	Haoqi Wu et.al.	2505.05922	null
2025-05-08	Facets of Disparate Impact: Evaluating Legally Consistent Bias in Machine Learning	Jarren Briscoe et.al.	2505.05471	link
2025-05-08	Generating Physically Stable and Buildable LEGO Designs from Text	Ava Pun et.al.	2505.05469	link
2025-05-08	SITE: towards Spatial Intelligence Thorough Evaluation	Wenqi Wang et.al.	2505.05456	null
2025-05-08	Conversational Process Model Redesign	Nataliia Klievtsova et.al.	2505.05453	null
2025-05-08	Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding	Han Xiao et.al.	2505.05446	link
2025-05-08	clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations	Chalamalasetti Kranti et.al.	2505.05445	null
2025-05-08	The soft X-ray transient EP241021a: a cosmic explosion with a complex off-axis jet and cocoon from a massive progenitor	Giulia Gianfagna et.al.	2505.05444	null
2025-05-08	GesPrompt: Leveraging Co-Speech Gestures to Augment LLM-Based Interaction in Virtual Reality	Xiyun Hu et.al.	2505.05441	null
2025-05-08	Reasoning Models Don’t Always Say What They Think	Yanda Chen et.al.	2505.05410	null
2025-05-08	GeomHair: Reconstruction of Hair Strands from Colorless 3D Scans	Rachmadio Noval Lazuardi et.al.	2505.05376	null
2025-05-07	Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond	Jessie Richter-Powell et.al.	2505.04621	null
2025-05-07	Perpetuating Misogyny with Generative AI: How Model Personalization Normalizes Gendered Harm	Laura Wagner et.al.	2505.04600	null
2025-05-07	Dynamic Network Flow Optimization for Task Scheduling in PTZ Camera Surveillance Systems	Mohammad Merati et.al.	2505.04596	null
2025-05-07	Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization	Wenjun Cao et.al.	2505.04578	null
2025-05-07	Componential Prompt-Knowledge Alignment for Domain Incremental Learning	Kunlun Xu et.al.	2505.04575	link
2025-05-07	Overcoming Data Scarcity in Generative Language Modelling for Low-Resource Languages: A Systematic Review	Josh McGiff et.al.	2505.04531	null
2025-05-07	Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving	Qi Liu et.al.	2505.04528	null
2025-05-07	Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model	Pengfei Guo et.al.	2505.04522	null
2025-05-07	User and Recommender Behavior Over Time: Contextualizing Activity, Effectiveness, Diversity, and Fairness in Book Recommendation	Samira Vaez Barenji et.al.	2505.04518	null
2025-05-07	CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation	Jiahao Li et.al.	2505.04481	null
2025-05-07	Visual Imitation Enables Contextual Humanoid Control	Arthur Allshire et.al.	2505.03729	null
2025-05-06	CaRaFFusion: Improving 2D Semantic Segmentation with Camera-Radar Point Cloud Fusion and Zero-Shot Image Inpainting	Huawei Sun et.al.	2505.03679	null
2025-05-06	Graph Drawing for LLMs: An Empirical Evaluation	Walter Didimo et.al.	2505.03678	null
2025-05-06	Towards conversational assistants for health applications: using ChatGPT to generate conversations about heart failure	Anuja Tayal et.al.	2505.03675	null
2025-05-06	Statistical geochemical constraints on present-day water outgassing as a source of secondary atmospheres on the TRAPPIST-1 exoplanets	Trent B. Thomas et.al.	2505.03672	link
2025-05-06	ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant	Yifan Xiang et.al.	2505.03654	link
2025-05-06	Binding threshold units with artificial oscillatory neurons	Vladimir Fanaskov et.al.	2505.03648	link
2025-05-06	PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing	Yiping Xie et.al.	2505.03621	null
2025-05-06	Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images	Fangling Jiang et.al.	2505.03611	null
2025-05-06	Learning Knowledge-based Prompts for Robust 3D Mask Presentation Attack Detection	Fangling Jiang et.al.	2505.03610	null
2025-05-05	Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation	Lu Ling et.al.	2505.02836	null
2025-05-05	Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models	Kuofeng Gao et.al.	2505.02824	link
2025-05-05	MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset via Attention Routing	Zinan Guo et.al.	2505.02823	link
2025-05-05	AutoLibra: Agent Metric Induction from Open-Ended Feedback	Hao Zhu et.al.	2505.02820	link
2025-05-05	Generating HomeAssistant Automations Using an LLM-based Chatbot	Mathyas Giudici et.al.	2505.02802	null
2025-05-05	HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models	Zheng Lin et.al.	2505.02795	null
2025-05-05	Giving Simulated Cells a Voice: Evolving Prompt-to-Intervention Models for Cellular Control	Nam H. Le et.al.	2505.02766	null
2025-05-05	Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models	Yankai Jiang et.al.	2505.02753	link
2025-05-05	How May U.S. Courts Scrutinize Their Recidivism Risk Assessment Tools? Contextualizing AI Fairness Criteria on a Judicial Scrutiny-based Framework	Tin Nguyen et.al.	2505.02749	null
2025-05-06	Knowledge Graphs for Enhancing Large Language Models in Entity Disambiguation	Gerard Pons et.al.	2505.02737	null
2025-05-02	Helping Big Language Models Protect Themselves: An Enhanced Filtering and Summarization System	Sheikh Samit Muhaimin et.al.	2505.01315	null
2025-05-02	A Factorized Probabilistic Model of the Semantics of Vague Temporal Adverbials Relative to Different Event Types	Svenja Kenneweg et.al.	2505.01311	null
2025-05-02	Deblurring fission fragment mass distributions	Pierre Nzabahimana et.al.	2505.01294	null
2025-05-05	TSTMotion: Training-free Scene-aware Text-to-motion Generation	Ziyan Guo et.al.	2505.01182	null
2025-05-02	On the Limitations of Steering in Language Model Alignment	Chebrolu Niranjan et.al.	2505.01162	null
2025-05-02	Methodological Foundations for AI-Driven Survey Question Generation	Ted K. Mburu et.al.	2505.01150	null
2025-05-02	Poster: Machine Learning for Vulnerability Detection as Target Oracle in Automated Fuzz Driver Generation	Gianpietro Castiglione et.al.	2505.01123	null
2025-05-02	VSC: Visual Search Compositional Text-to-Image Diffusion Model	Do Huu Dat et.al.	2505.01104	null
2025-05-02	Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages	Marco Salmè et.al.	2505.01096	null
2025-05-02	Improving Editability in Image Generation with Layer-wise Memory	Daneul Kim et.al.	2505.01079	null
2025-05-01	T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT	Dongzhi Jiang et.al.	2505.00703	link
2025-05-01	Steering Large Language Models with Register Analysis for Arbitrary Style Transfer	Xinchen Yang et.al.	2505.00679	null
2025-05-01	Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions	Yiming Du et.al.	2505.00675	link
2025-05-01	Open-Source LLM-Driven Federated Transformer for Predictive IoV Management	Yazan Otoum et.al.	2505.00651	null
2025-05-01	The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)	Zihao Wang et.al.	2505.00626	null
2025-05-01	Can LLMs Help Improve Analogical Reasoning For Strategic Decisions? Experimental Evidence from Humans and GPT-4	Phanish Puranam et.al.	2505.00603	null
2025-05-01	Block Circulant Adapter for Large Language Models	Xinyu Ding et.al.	2505.00582	null
2025-05-01	Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models	Makoto Sato et.al.	2505.00557	null
2025-05-01	Variational OOD State Correction for Offline Reinforcement Learning	Ke Jiang et.al.	2505.00503	null
2025-05-01	Red Teaming Large Language Models for Healthcare	Vahid Balazadeh et.al.	2505.00467	null
2025-04-30	Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization	Anas Anwarul Haq Khan et.al.	2504.21831	null
2025-04-30	Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields	Yixin Gao et.al.	2504.21814	null
2025-04-30	DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition	Z. Z. Ren et.al.	2504.21801	link
2025-04-30	Balancing Interpretability and Flexibility in Modeling Diagnostic Trajectories with an Embedded Neural Hawkes Process Model	Yuankang Zhao et.al.	2504.21795	null
2025-04-30	Three-dimensional horseshoes near an unfolding of a Hopf-Hopf singularity	Santiago Ibáñez et.al.	2504.21783	null
2025-04-30	LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs	Baleegh Ahmad et.al.	2504.21770	null
2025-04-30	LLM-based Interactive Imitation Learning for Robotic Manipulation	Jonas Werner et.al.	2504.21769	link
2025-04-30	Enhancing Health Mention Classification Performance: A Study on Advancements in Parameter Efficient Tuning	Reem Abdel-Salam et.al.	2504.21685	null
2025-04-30	Traceback of Poisoning Attacks to Retrieval-Augmented Generation	Baolei Zhang et.al.	2504.21668	null
2025-04-30	From Precision to Perception: User-Centred Evaluation of Keyword Extraction Algorithms for Internet-Scale Contextual Advertising	Jingwen Cai et.al.	2504.21667	null
2025-04-29	YoChameleon: Personalized Vision and Language Generation	Thao Nguyen et.al.	2504.20998	null
2025-04-29	ACE: A Security Architecture for LLM-Integrated App Systems	Evan Li et.al.	2504.20984	null
2025-04-29	Jekyll-and-Hyde Tipping Point in an AI’s Behavior	Neil F. Johnson et.al.	2504.20980	null
2025-04-29	AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security	Zikui Cai et.al.	2504.20965	link
2025-04-29	Trace-of-Thought: Enhanced Arithmetic Problem Solving via Reasoning Distillation From Large to Small Language Models	Tyler McDonald et.al.	2504.20946	null
2025-04-29	Leveraging Generative AI Through Prompt Engineering and Rigorous Validation to Create Comprehensive Synthetic Datasets for AI Training in Healthcare	Polycarp Nalela et.al.	2504.20921	null
2025-04-29	An Empirical Study on the Capability of LLMs in Decomposing Bug Reports	Zhiyuan Chen et.al.	2504.20911	null
2025-04-29	CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models	Hasan Md Tusfiqur Alam et.al.	2504.20898	link
2025-04-29	LELANTE: LEveraging LLM for Automated ANdroid TEsting	Shamit Fatin et.al.	2504.20896	null
2025-04-29	AI-GenBench: A New Ongoing Benchmark for AI-Generated Image Detection	Lorenzo Pellegrini et.al.	2504.20865	null
2025-04-29	Cam-2-Cam: Exploring the Design Space of Dual-Camera Interactions for Smartphone-based Augmented Reality	Brandon Woodard et.al.	2504.20035	null
2025-04-28	Applying LLM-Powered Virtual Humans to Child Interviews in Child-Centered Design	Linshi Li et.al.	2504.20016	null
2025-04-28	Breast Cancer Detection from Multi-View Screening Mammograms with Visual Prompt Tuning	Han Chen et.al.	2504.19900	null
2025-04-28	GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets	Mingqian He et.al.	2504.19898	null
2025-04-28	CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition	Quynh Phung et.al.	2504.19894	null
2025-04-28	Federated Out-of-Distribution Generalization: A Causal Augmentation View	Runhui Zhang et.al.	2504.19882	null
2025-04-28	DeeCLIP: A Robust and Generalizable Transformer-Based Framework for Detecting AI-Generated Images	Mamadou Keita et.al.	2504.19876	link
2025-04-28	Factorization of multimeters: a unified view on nonclassical quantum phenomena	Tim Achenbach et.al.	2504.19865	null
2025-04-28	CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback	Chenhan Jiang et.al.	2504.19860	null
2025-04-28	Do You Know the Way? Human-in-the-Loop Understanding for Fast Traversability Estimation in Mobile Robotics	Andre Schreiber et.al.	2504.19851	link
2025-04-25	LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection	Rajesh Yarra et.al.	2504.18423	null
2025-04-25	Can Code Outlove Blood? A LLM-based VR Experience to Prompt Reflection on Parental Verbal Abuse	Jiaying Fu et.al.	2504.18410	null
2025-04-25	Paradigm shift on Coding Productivity Using GenAI	Liang Yu et.al.	2504.18404	null
2025-04-25	Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization	Kesen Zhao et.al.	2504.18397	link
2025-04-25	Pushing the boundary on Natural Language Inference	Pablo Miralles-González et.al.	2504.18376	null
2025-04-25	Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections	Narek Maloyan et.al.	2504.18333	null
2025-04-25	Depth3DLane: Monocular 3D Lane Detection via Depth Prior Distillation	Dongxin Lyu et.al.	2504.18325	null
2025-04-25	STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting	Yunze Deng et.al.	2504.18318	null
2025-04-25	Towards Adaptive Software Agents for Debugging	Yacine Majdoub et.al.	2504.18316	null
2025-04-25	Charm-hadron reconstruction through three body decay in hadronic collisions using Machine Learning	Neelkamal Mallick et.al.	2504.18279	null
2025-04-24	Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models	Xu Ma et.al.	2504.17789	null
2025-04-24	Replay to Remember: Retaining Domain Knowledge in Streaming Language Models	Sneh Pillai et.al.	2504.17780	null
2025-04-24	DPMambaIR:All-in-One Image Restoration via Degradation-Aware Prompt State Space Model	Zhanwen Liu et.al.	2504.17732	null
2025-04-24	Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields	Zhuo He et.al.	2504.17712	null
2025-04-24	Beyond Labels: Zero-Shot Diabetic Foot Ulcer Wound Segmentation with Self-attention Diffusion Models and the Potential for Text-Guided Customization	Abderrachid Hamrani et.al.	2504.17628	null
2025-04-24	When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars	Rei Higuchi et.al.	2504.17562	null
2025-04-24	Rethinking PM Crash Consistency in the CXL Era	João Oliveira et.al.	2504.17554	null
2025-04-24	Auditing the Ethical Logic of Generative AI Models	W. Russell Neuman et.al.	2504.17544	null
2025-04-24	Towards Machine-Generated Code for the Resolution of User Intentions	Justus Flerlage et.al.	2504.17531	link
2025-04-24	IRA: Adaptive Interest-aware Representation and Alignment for Personalized Multi-interest Retrieval	Youngjune Lee et.al.	2504.17529	null
2025-04-23	BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation	Ruotong Wang et.al.	2504.16907	null
2025-04-23	Texture: Structured Exploration of Text Datasets	Will Epperson et.al.	2504.16898	null
2025-04-23	Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models	Xuyang Zhu et.al.	2504.16883	null
2025-04-23	Context-Enhanced Vulnerability Detection Based on Large Language Model	Yixin Yang et.al.	2504.16877	null
2025-04-24	Exploring How LLMs Capture and Represent Domain-Specific Knowledge	Mirian Hipolito Garcia et.al.	2504.16871	null
2025-04-23	Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification	Alexander Shvets et.al.	2504.16856	null
2025-04-23	GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning	Luu Quy Tung et.al.	2504.16832	null
2025-04-23	Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation	Lakshita Agarwal et.al.	2504.16788	null
2025-04-23	Evaluating the Impact of a Yoga-Based Intervention on Software Engineers’ Well-Being	Cristina Martinez Montes et.al.	2504.16779	null
2025-04-23	How Effective are Generative Large Language Models in Performing Requirements Classification?	Waad Alhoshan et.al.	2504.16768	null
2025-04-22	From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning	Le Zhuo et.al.	2504.16080	null
2025-04-22	LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities	Thomas Schmied et.al.	2504.16078	null
2025-04-22	Describe Anything: Detailed Localized Image and Video Captioning	Long Lian et.al.	2504.16072	null
2025-04-22	ForesightNav: Learning Scene Imagination for Efficient Exploration	Hardik Shah et.al.	2504.16062	link
2025-04-22	Vision language models are unreliable at trivial spatial cognition	Sangeet Khemlani et.al.	2504.16061	null
2025-04-22	Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability	Daniel Hendriks et.al.	2504.16056	null
2025-04-22	PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning	Song Wang et.al.	2504.16023	link
2025-04-22	Navigating the State of Cognitive Flow: Context-Aware AI Interventions for Effective Reasoning Support	Dinithi Dissanayake et.al.	2504.16021	null
2025-04-22	Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework	Xinyuan Song et.al.	2504.16016	null
2025-04-23	CAPO: Cost-Aware Prompt Optimization	Tom Zehle et.al.	2504.16005	link
2025-04-21	Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models	Guo Chen et.al.	2504.15271	null
2025-04-21	A Refreshment Stirred, Not Shaken (III): Can Swapping Be Differentially Private?	James Bailie et.al.	2504.15246	null
2025-04-21	A Deep Learning Framework for Sequence Mining with Bidirectional LSTM and Multi-Scale Attention	Tao Yang et.al.	2504.15223	null
2025-04-21	EvalAgent: Discovering Implicit Evaluation Criteria from the Web	Manya Wadhwa et.al.	2504.15219	null
2025-04-22	LACE: Controlled Image Prompting and Iterative Refinement with GenAI for Professional Visual Art Creators	Yenkai Huang et.al.	2504.15189	null
2025-04-21	The Synthetic Imputation Approach: Generating Optimal Synthetic Texts For Underrepresented Categories In Supervised Classification Tasks	Joan C. Timoneda et.al.	2504.15160	null
2025-04-21	Contemplative Wisdom for Superalignment	Ruben Laukkonen et.al.	2504.15125	null
2025-04-21	Empowering AI to Generate Better AI Code: Guided Generation of Deep Learning Projects with LLMs	Chen Xie et.al.	2504.15080	null
2025-04-21	Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides	Jinghua Zhao et.al.	2504.15066	null
2025-04-21	OPO: Making Decision-Focused Data Acquisition Decisions	Egon Peršak et.al.	2504.15062	null
2025-04-18	Audit Cards: Contextualizing AI Evaluations	Leon Staufer et.al.	2504.13839	null
2025-04-18	Science Hierarchography: Hierarchical Organization of Science Literature	Muhan Gao et.al.	2504.13834	link
2025-04-18	Generative AI Act II: Test Time Scaling Drives Cognition Engineering	Shijie Xia et.al.	2504.13828	link
2025-04-18	Quantum Contextuality for Contextual Word Embeddings	Karl Svozil et.al.	2504.13824	null
2025-04-18	Fighting Fires from Space: Leveraging Vision Transformers for Enhanced Wildfire Detection and Characterization	Aman Agarwal et.al.	2504.13776	link
2025-04-21	BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models	Zhengxian Wu et.al.	2504.13775	null
2025-04-18	Scaling sparse feature circuit finding for in-context learning	Dmitrii Kharlapenko et.al.	2504.13756	null
2025-04-18	ESPLoRA: Enhanced Spatial Precision with Low-Rank Adaption in Text-to-Image Diffusion Models for High-Definition Synthesis	Andrea Rigo et.al.	2504.13745	null
2025-04-18	Controlled Territory and Conflict Tracking (CONTACT): (Geo-)Mapping Occupied Territory from Open Source Intelligence	Paul K. Mandal et.al.	2504.13730	link
2025-04-18	Exploring Multimodal Prompt for Visualization Authoring with Large Language Models	Zhen Wen et.al.	2504.13700	null
2025-04-17	IMAGGarment-1: Fine-Grained Garment Generation for Controllable Fashion Design	Fei Shen et.al.	2504.13176	link
2025-04-17	Personalized Text-to-Image Generation with Auto-Regressive Models	Kaiyue Sun et.al.	2504.13162	link
2025-04-17	Science-T2I: Addressing Scientific Illusions in Image Synthesis	Jialuo Li et.al.	2504.13129	null
2025-04-17	Object-Driven Narrative in AR: A Scenario-Metaphor Framework with VLM Integration	Yusi Sun et.al.	2504.13119	null
2025-04-17	Should We Tailor the Talk? Understanding the Impact of Conversational Styles on Preference Elicitation in Conversational Recommender Systems	Ivica Kostric et.al.	2504.13095	link
2025-04-17	EventVAD: Training-Free Event-Aware Video Anomaly Detection	Yihua Shao et.al.	2504.13092	null
2025-04-18	SkyReels-V2: Infinite-length Film Generative Model	Guibin Chen et.al.	2504.13074	link
2025-04-17	Early Accessibility: Automating Alt-Text Generation for UI Icons During App Development	Sabrina Haque et.al.	2504.13069	null
2025-04-17	Accuracy is Not Agreement: Expert-Aligned Evaluation of Crash Narrative Classification Models	Sudesh Ramesh Bhagat et.al.	2504.13068	null
2025-04-17	Aspect-Based Summarization with Self-Aspect Retrieval Enhanced Generation	Yichao Feng et.al.	2504.13054	null
2025-04-16	Towards Learning to Complete Anything in Lidar	Ayca Takmaz et.al.	2504.12264	null
2025-04-16	Cobra: Efficient Line Art COlorization with BRoAder References	Junhao Zhuang et.al.	2504.12240	null
2025-04-16	Exploring GRBs and supernovae connection: does a superluminous hypernova population exist?	Achille Fiore et.al.	2504.12224	null
2025-04-16	Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification	Jaime E. Cuellar et.al.	2504.12180	null
2025-04-16	FocusedAD: Character-centric Movie Audio Description	Xiaojun Ye et.al.	2504.12157	link
2025-04-16	ARCeR: an Agentic RAG for the Automated Definition of Cyber Ranges	Matteo Lupinacci et.al.	2504.12143	null
2025-04-16	Multilingual Contextualization of Large Language Models for Document-Level Machine Translation	Miguel Moura Ramos et.al.	2504.12140	null
2025-04-16	Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models -	Laura Fieback et.al.	2504.12137	null
2025-04-16	Clarifying Ambiguities: on the Role of Ambiguity Types in Prompting Methods for Clarification Generation	Anfu Tang et.al.	2504.12113	null
2025-04-16	A Diffusion-Based Framework for Terrain-Aware Remote Sensing Image Reconstruction	Zhenyu Yu et.al.	2504.12112	null
2025-04-15	Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception	Ziqi Pang et.al.	2504.11457	link
2025-04-15	SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL	Junke Wang et.al.	2504.11455	link
2025-04-15	RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models	Juan Diego Rodriguez et.al.	2504.11381	link
2025-04-15	DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks	Yupei Liu et.al.	2504.11358	link
2025-04-16	Seedream 3.0 Technical Report	Yu Gao et.al.	2504.11346	null
2025-04-15	A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce	Wei Xiong et.al.	2504.11343	link
2025-04-15	A Mathematical Framework of Semantic Communication based on Category Theory	Shuheng Hua et.al.	2504.11334	null
2025-04-15	Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis	Hao Liu et.al.	2504.11331	null
2025-04-15	Decorrelation in Complex Wave Scattering	Qihang Zhang et.al.	2504.11330	null
2025-04-15	Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints	Ruicheng Ao et.al.	2504.11320	link
2025-04-14	Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding	Tao Zhang et.al.	2504.10465	link
2025-04-14	Can LLMs Assist Expert Elicitation for Probabilistic Causal Modeling?	Olha Shaposhnyk et.al.	2504.10397	null
2025-04-14	Brain-Machine Interfaces & Information Retrieval Challenges and Opportunities	Yashar Moshfeghi et.al.	2504.10371	null
2025-04-14	SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning	Yiting Wang et.al.	2504.10369	null
2025-04-14	DICE: A Framework for Dimensional and Contextual Evaluation of Language Models	Aryan Shrivastava et.al.	2504.10359	null
2025-04-14	Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis	Yifan Yang et.al.	2504.10352	null
2025-04-15	Efficient Prompt Tuning for Hierarchical Ingredient Recognition	Yinxuan Gui et.al.	2504.10322	null
2025-04-14	SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model	Zongcan Ding et.al.	2504.10320	null
2025-04-14	Analysis of Attention in Video Diffusion Transformers	Yuxin Wen et.al.	2504.10317	null
2025-04-14	ESCT3D: Efficient and Selectively Controllable Text-Driven 3D Content Generation with Gaussian Splatting	Huiqi Wu et.al.	2504.10316	null
2025-04-11	Towards an Understanding of Context Utilization in Code Intelligence	Yanlin Wang et.al.	2504.08734	null
2025-04-11	Generating Fine Details of Entity Interactions	Xinyi Gu et.al.	2504.08714	null
2025-04-11	Fast-Slow-Thinking: Complex Task Solving with Large Language Models	Yiliu Sun et.al.	2504.08690	null
2025-04-11	Variability-Driven User-Story Generation using LLM and Triadic Concept Analysis	Alexandre Bazin et.al.	2504.08666	null
2025-04-11	Quality evaluation of Tabby coding assistant using real source code snippets	Marta Borek et.al.	2504.08650	link
2025-04-11	Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization	Jialu Li et.al.	2504.08641	null
2025-04-11	A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English	Julian Bäumler et.al.	2504.08609	null
2025-04-11	Lexical Bundle Frequency as a Construct-Relevant Candidate Feature in Automated Scoring of L2 Academic Writing	Burak Senel et.al.	2504.08537	link
2025-04-11	Task Memory Engine (TME): Enhancing State Awareness for Multi-Step LLM Agent Tasks	Ye Ye et.al.	2504.08525	link
2025-04-11	Scholar Inbox: Personalized Paper Recommendations for Scientists	Markus Flicke et.al.	2504.08385	null
2025-04-10	C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing	Zhongyang Li et.al.	2504.07964	link
2025-04-10	Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge	Riccardo Cantini et.al.	2504.07887	link
2025-04-10	Towards Sustainable Creativity Support: An Exploratory Study on Prompt Based Image Generation	Daniel Hove Paludan et.al.	2504.07879	null
2025-04-10	SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos	Joshua Li et.al.	2504.07867	null
2025-04-10	2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization	Mengyang Li et.al.	2504.07856	null
2025-04-10	Understanding Learner-LLM Chatbot Interactions and the Impact of Prompting Guidelines	Cansu Koyuturk et.al.	2504.07840	null
2025-04-10	HarmonySeg: Tubular Structure Segmentation with Deep-Shallow Feature Fusion and Growth-Suppression Balanced Loss	Yi Huang et.al.	2504.07827	null
2025-04-10	What the HellaSwag? On the Validity of Common-Sense Reasoning Benchmarks	Pavel Chizhov et.al.	2504.07825	link
2025-04-10	A System for Comprehensive Assessment of RAG Frameworks	Mattia Rengo et.al.	2504.07803	link
2025-04-10	FairEval: Evaluating Fairness in LLM-Based Recommendations with Personality Awareness	Chandan Kumar Sah et.al.	2504.07801	null
2025-04-09	A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility	Andreas Hochlehnert et.al.	2504.07086	null
2025-04-09	Generalized Semantic Contrastive Learning via Embedding Side Information for Few-Shot Object Detection	Ruoyu Chen et.al.	2504.07060	link
2025-04-09	TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling	Liang-Hsuan Tseng et.al.	2504.07053	link
2025-04-09	Towards LLMs Robustness to Changes in Prompt Format Styles	Lilian Ngweta et.al.	2504.06969	null
2025-04-09	RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News Texts	Natalia Loukachevitch et.al.	2504.06947	link
2025-04-09	Review of Case-Based Reasoning for LLM Agents: Theoretical Foundations, Architectural Components, and Cognitive Integration	Kostas Hatalis et.al.	2504.06943	null
2025-04-09	FeedbackEval: A Benchmark for Evaluating Large Language Models in Feedback-Driven Code Repair Tasks	Dekun Dai et.al.	2504.06939	link
2025-04-09	MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs	Jiawei Mao et.al.	2504.06897	null
2025-04-09	MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Chang Nie et.al.	2504.06863	null
2025-04-09	EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation	Diljeet Jagpal et.al.	2504.06861	null
2025-04-09	Hogwild! Inference: Parallel LLM Generation via Concurrent Attention	Gleb Rodionov et.al.	2504.06261	link
2025-04-08	Multi-Modality Sensing in mmWave Beamforming for Connected Vehicles Using Deep Learning	Muhammad Baqer Mollah et.al.	2504.06173	link
2025-04-08	A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning	Akash Kumar et.al.	2504.06153	null
2025-04-08	Multi-Sense Embeddings for Language Models and Knowledge Distillation	Qitong Wang et.al.	2504.06036	null
2025-04-08	Information-Theoretic Reward Decomposition for Generalizable RLHF	Liyuan Mao et.al.	2504.06020	null
2025-04-08	Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?	Roman Kochnev et.al.	2504.06006	null
2025-04-08	econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians	Can Zhang et.al.	2504.06003	null
2025-04-08	NativQA Framework: Enabling LLMs with Native, Local, and Everyday Knowledge	Firoj Alam et.al.	2504.05995	null
2025-04-08	An Empirical Study of GPT-4o Image Generation Capabilities	Sixiang Chen et.al.	2504.05979	link
2025-04-08	AVP-AP: Self-supervised Automatic View Positioning in 3D cardiac CT via Atlas Prompting	Xiaolin Fan et.al.	2504.05966	null
2025-04-07	CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models	Kavana Venkatesh et.al.	2504.05306	null
2025-04-07	URECA: Unique Region Caption Anything	Sangbeom Lim et.al.	2504.05305	null
2025-04-08	NoveltyBench: Evaluating Language Models for Humanlike Diversity	Yiming Zhang et.al.	2504.05228	null
2025-04-08	Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG	Hengran Zhang et.al.	2504.05220	null
2025-04-07	MSA-UNet3+: Multi-Scale Attention UNet3+ with New Supervised Prototypical Contrastive Loss for Coronary DSA Image Segmentation	Rayan Merghani Ahmed et.al.	2504.05184	null
2025-04-07	BRIDGES: Bridging Graph Modality and Large Language Models within EDA Tasks	Wei Li et.al.	2504.05180	null
2025-04-07	Attention-Based Multi-Scale Temporal Fusion Network for Uncertain-Mode Fault Diagnosis in Multimode Processes	Guangqiang Li et.al.	2504.05172	link
2025-04-07	Pr $εε$ mpt: Sanitizing Sensitive Prompts for LLMs	Amrita Roy Chowdhury et.al.	2504.05147	link
2025-04-07	DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration	Jiamei Xiong et.al.	2504.05135	null
2025-04-07	ABCDWaveNet: Advancing Robust Road Ponding Detection in Fog through Dynamic Frequency-Spatial Synergy	Ronghui Zhang et.al.	2504.05112	null
2025-04-04	Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions	Ting-Hsuan Liao et.al.	2504.03639	null
2025-04-04	VISTA-OCR: Towards generative and interactive end to end OCR models	Laziz Hamdi et.al.	2504.03621	null
2025-04-04	PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector	Kaidong Li et.al.	2504.03563	null
2025-04-04	Diverse In-Context Example Selection After Decomposing Programs and Aligned Utterances Improves Semantic Parsing	Mayank Kothyari et.al.	2504.03541	link
2025-04-04	State estimation for gas purity monitoring and control in water electrolysis systems	Lucas Cammann et.al.	2504.03522	null
2025-04-04	ATM-Net: Anatomy-Aware Text-Guided Multi-Modal Fusion for Fine-Grained Lumbar Spine Segmentation	Sheng Lian et.al.	2504.03476	null
2025-04-04	Locations of Characters in Narratives: Andersen and Persuasion Datasets	Batuhan Ozyurt et.al.	2504.03434	link
2025-04-04	MultiClear: Multimodal Soft Exoskeleton Glove for Transparent Object Grasping Assistance	Chen Hu et.al.	2504.03379	null
2025-04-04	Point Cloud-based Grasping for Soft Hand Exoskeleton	Chen Hu et.al.	2504.03369	null
2025-04-04	Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification	Francesca Ronchini et.al.	2504.03329	null
2025-04-03	A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models	Gaurav Verma et.al.	2504.02793	null
2025-04-03	A Framework for Robust Cognitive Evaluation of LLMs	Karin de Langis et.al.	2504.02789	null
2025-04-03	From Consumption to Collaboration: Measuring Interaction Patterns to Augment Human Cognition in Open-Ended Tasks	Joshua Holstein et.al.	2504.02780	null
2025-04-03	BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs	Alexander Leszczynski et.al.	2504.02779	link
2025-04-03	Robot-Led Vision Language Model Wellbeing Assessment of Children	Nida Itrat Abbasi et.al.	2504.02765	null
2025-04-04	RBT4DNN: Requirements-based Testing of Neural Networks	Nusrat Jahan Mozumder et.al.	2504.02737	link
2025-04-03	Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study	Aryan Agrawal et.al.	2504.02733	link
2025-04-03	LLM for Complex Reasoning Task: An Exploratory Study in Fermi Problems	Zishuo Liu et.al.	2504.02671	null
2025-04-03	Adaptive Frequency Enhancement Network for Remote Sensing Image Semantic Segmentation	Feng Gao et.al.	2504.02647	link
2025-04-03	Prompt Optimization with Logged Bandit Data	Haruka Kiyohara et.al.	2504.02646	null
2025-04-03	Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation	Baban Gain et.al.	2504.01919	null
2025-04-02	Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework	Andrey Sidorenko et.al.	2504.01908	link
2025-04-02	Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Shreyank N Gowda et.al.	2504.01890	null
2025-04-02	Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks	Ali Al-Kaswan et.al.	2504.01850	null
2025-04-02	Prompting Medical Vision-Language Models to Mitigate Diagnosis Bias by Generating Realistic Dermoscopic Images	Nusrat Munia et.al.	2504.01838	link
2025-04-02	Implicit Bias Injection Attacks against Text-to-Image Diffusion Models	Huayang Huang et.al.	2504.01819	link
2025-04-02	UniViTAR: Unified Vision Transformer with Native Resolution	Limeng Qiao et.al.	2504.01792	null
2025-04-02	Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation	Mingrui Ye et.al.	2504.01764	link
2025-04-02	Stable Structure Learning with HC-Stable and Tabu-Stable Algorithms	Neville K. Kitson et.al.	2504.01740	link
2025-04-02	TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication	Petr Vanc et.al.	2504.01708	null
2025-03-31	Consistent Subject Generation via Contrastive Instantiated Concepts	Lee Hsin-Ying et.al.	2503.24387	null
2025-03-31	Effectively Controlling Reasoning Models through Thinking Intervention	Tong Wu et.al.	2503.24370	null
2025-03-31	ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion	Rana Muhammad Shahroz Khan et.al.	2503.24354	null
2025-03-31	Contextual Preference Collaborative Measure Framework Based on Belief System	Hang Yu et.al.	2503.24328	null
2025-03-31	A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG	Arshia Kermani et.al.	2503.24307	null
2025-03-31	Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning	Jiacheng Lin et.al.	2503.24289	link
2025-03-31	EP240414a: Off-axis View of a Jet-Cocoon System from an Expanded Progenitor Star	Jian-He Zheng et.al.	2503.24266	null
2025-04-02	Text2Tracks: Prompt-based Music Recommendation via Generative Retrieval	Enrico Palumbo et.al.	2503.24193	null
2025-03-31	Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms	Shuoming Zhang et.al.	2503.24191	null
2025-03-31	LLM4FS: Leveraging Large Language Models for Feature Selection and How to Improve It	Jianhao Li et.al.	2503.24157	null
2025-03-28	ActionStudio: A Lightweight Framework for Data and Training of Action Models	Jianguo Zhang et.al.	2503.22673	link
2025-03-28	Unicorn: Text-Only Data Synthesis for Vision Language Model Training	Xiaomin Yu et.al.	2503.22655	link
2025-03-28	Shadow and gravitational lensing produced by the nonlinear accretion of a scalar field onto a black hole	J. C. Acevedo-Muñoz et.al.	2503.22624	null
2025-03-28	Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users	Antonia Karamolegkou et.al.	2503.22610	null
2025-03-28	Towards a Quantum Information Theory of Hadronization: Dihadron Fragmentation and Neutral Polarization in Heavy Baryons	Rebecca von Kuk et.al.	2503.22607	null
2025-03-28	Historical Ink: Exploring Large Language Models for Irony Detection in 19th-Century Spanish	Kevin Cohen et.al.	2503.22585	link
2025-03-28	Pseudovarieties of semigroups	Jorge Almeida et.al.	2503.22546	null
2025-03-28	Automated UX Insights from User Research Videos by Integrating Facial Emotion and Text Sentiment	Simran Kaur Ghatoray et.al.	2503.22510	null
2025-03-28	Generative Reliability-Based Design Optimization Using In-Context Learning Capabilities of Large Language Models	Zhonglin Jiang et.al.	2503.22401	null
2025-03-28	Fighting Fire with Fire: Channel-Independent RF Fingerprinting via the Ratio of Linear to Logarithmic Differential Spectrum	Tianshu Chen et.al.	2503.22378	null
2025-03-27	Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model	Abdelrahman Shaker et.al.	2503.21782	link
2025-03-27	VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models	Chi-Pin Huang et.al.	2503.21781	null
2025-03-27	Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation	Reza Qorbani et.al.	2503.21780	link
2025-03-27	Test-Time Visual In-Context Tuning	Jiahao Xie et.al.	2503.21777	link
2025-03-27	MemInsight: Autonomous Memory Augmentation for LLM Agents	Rana Salama et.al.	2503.21760	null
2025-03-27	Lumina-Image 2.0: A Unified and Efficient Image Generative Framework	Qi Qin et.al.	2503.21758	link
2025-03-27	VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness	Dian Zheng et.al.	2503.21755	link
2025-03-27	LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis	Shitian Zhao et.al.	2503.21749	null
2025-03-27	3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models	Yuhan Zhang et.al.	2503.21745	null
2025-03-27	GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics	Arsham Gholamzadeh Khoee et.al.	2503.21735	null
2025-03-26	Understanding R1-Zero-Like Training: A Critical Perspective	Zichen Liu et.al.	2503.20783	link
2025-03-26	Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising	Yan-Bo Lin et.al.	2503.20782	null
2025-03-26	Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields	Shijie Zhou et.al.	2503.20776	null
2025-03-27	Beyond Believability: Accurate Human Behavior Simulation with Fine-Tuned LLMs	Yuxuan Lu et.al.	2503.20749	null
2025-03-26	Vision as LoRA	Han Wang et.al.	2503.20680	link
2025-03-26	BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation	Yuyang Peng et.al.	2503.20672	null
2025-03-26	AutoRad-Lung: A Radiomic-Guided Prompting Autoregressive Vision-Language Model for Lung Nodule Malignancy Prediction	Sadaf Khademi et.al.	2503.20662	null
2025-03-26	AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports	Xiangwen Zhang et.al.	2503.20654	null
2025-03-26	Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging	Han Wu et.al.	2503.20641	link
2025-03-26	IAP: Improving Continual Learning of Vision-Language Models via Instance-Aware Prompting	Hao Fu et.al.	2503.20612	link
2025-03-25	Scaling Vision Pre-Training to 4K Resolution	Baifeng Shi et.al.	2503.19903	null
2025-03-25	Scaling Down Text Encoders of Text-to-Image Diffusion Models	Lifu Wang et.al.	2503.19897	link
2025-03-25	A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design	Jie Tian et.al.	2503.19889	null
2025-03-25	CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation	Nengbo Wang et.al.	2503.19878	null
2025-03-25	Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators	Seungone Kim et.al.	2503.19877	null
2025-03-25	An Overview of Low-Rank Structures in the Training and Adaptation of Large Models	Laura Balzano et.al.	2503.19859	null
2025-03-25	Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking	Xiaoyu Tian et.al.	2503.19855	null
2025-03-25	Towards Online Multi-Modal Social Interaction Understanding	Xinpeng Li et.al.	2503.19851	link
2025-03-25	A Comparative Analysis of Word Segmentation, Part-of-Speech Tagging, and Named Entity Recognition for Historical Chinese Sources, 1900-1950	Zhao Fang et.al.	2503.19844	null
2025-03-25	Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy	Athiya Deviyani et.al.	2503.19828	null
2025-03-24	Target-Aware Video Diffusion Models	Taeksoo Kim et.al.	2503.18950	null
2025-03-24	Equivariant Image Modeling	Ruixiao Dong et.al.	2503.18948	link
2025-03-24	Video-T1: Test-Time Scaling for Video Generation	Fangfu Liu et.al.	2503.18942	null
2025-03-25	Coincidence measurement of two-photon double ionization of argon through an autoionizing resonance	Sebastian Hell et.al.	2503.18913	null
2025-03-24	AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration	Zhexuan Wang et.al.	2503.18891	link
2025-03-24	Efficient and Accurate Scene Text Recognition with Cascaded-Transformers	Savas Ozkan et.al.	2503.18883	null
2025-03-24	Efficient Self-Supervised Adaptation for Medical Image Analysis	Moein Sorkhei et.al.	2503.18873	link
2025-03-24	Reasoning to Learn from Latent Thoughts	Yangjun Ruan et.al.	2503.18866	null
2025-03-25	MC-LLaVA: Multi-Concept Personalized Vision-Language Model	Ruichuan An et.al.	2503.18854	link
2025-03-24	3DSwapping: Texture Swapping For 3D Object From Single Reference Image	Xiao Cao et.al.	2503.18853	null
2025-03-21	Core Components of Emotional Impulsivity: A Mouse-Cursor Tracking Study	Anton Leontyev et.al.	2503.17328	null
2025-03-21	FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models	Mingyang Song et.al.	2503.17287	link
2025-03-21	Revisiting End To End Sparse Autoencoder Training – A Short Finetune is All You Need	Adam Karvonen et.al.	2503.17272	link
2025-03-21	Slide-Level Prompt Learning with Vision Language Models for Few-Shot Multiple Instance Learning in Histopathology	Devavrat Tomar et.al.	2503.17238	link
2025-03-21	LLMs Love Python: A Study of LLMs’ Bias for Programming Languages and Libraries	Lukas Twist et.al.	2503.17181	link
2025-03-21	ExplainitAI: When do we trust artificial intelligence? The influence of content and explainability in a cross-cultural comparison	Sora Kang et.al.	2503.17158	null
2025-03-21	Modifying Large Language Model Post-Training for Diverse Creative Writing	John Joon Young Chung et.al.	2503.17126	link
2025-03-21	Multi-modal Multi-platform Person Re-Identification: Benchmark and Method	Ruiyang Ha et.al.	2503.17096	null
2025-03-21	Collapse of Rotating White Dwarfs and Multimessenger Signals	Takami Kuroda et.al.	2503.17082	null
2025-03-21	Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?	Jeremy Barnes et.al.	2503.17039	null
2025-03-20	DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding	Keyan Chen et.al.	2503.16426	link
2025-03-20	Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models	Yang Sui et.al.	2503.16419	link
2025-03-20	Sparse Nonparametric Contextual Bandits	Hamish Flynn et.al.	2503.16382	null
2025-03-20	Enhancing Software Quality Assurance with an Adaptive Differential Evolution based Quantum Variational Autoencoder-Transformer Model	Seshu Babu Barma et.al.	2503.16335	null
2025-03-20	LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates	Ying Shen et.al.	2503.16334	null
2025-03-20	Issue2Test: Generating Reproducing Test Cases from Issue Reports	Noor Nashid et.al.	2503.16320	null
2025-03-20	PSA-MIL: A Probabilistic Spatial Attention-Based Multiple Instance Learning for Whole Slide Image Classification	Sharon Peled et.al.	2503.16284	link
2025-03-20	Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data	Zijian Li et.al.	2503.16260	null
2025-03-20	M2N2V2: Multi-Modal Unsupervised and Training-free Interactive Segmentation	Markus Karmann et.al.	2503.16254	null
2025-03-20	AI Agents in Cryptoland: Practical Attacks and No Silver Bullet	Atharv Singh Patlan et.al.	2503.16248	null
2025-03-20	Dynamic Bi-Elman Attention Networks (DBEAN): Dual-Directional Context-Aware Representation Learning for Enhanced Text Classification	ZhengLin Lai et.al.	2503.15469	link
2025-03-19	Visual Position Prompt for MLLM based Visual Grounding	Wei Tang et.al.	2503.15426	link
2025-03-19	Probing the topology of the space of tokens with structured prompts	Michael Robinson et.al.	2503.15421	null
2025-03-19	A time-to-event three-outcome design for randomized phase II cancer trials	Minghua Shan et.al.	2503.15418	null
2025-03-19	TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification	Junnan Zhu et.al.	2503.15289	null
2025-03-19	TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning in Text-to-Image Models	Teng-Fang Hsiao et.al.	2503.15283	null
2025-03-19	Do Chains-of-Thoughts of Large Language Models Suffer from Hallucinations, Cognitive Biases, or Phobias in Bayesian Reasoning?	Roberto Araya et.al.	2503.15268	null
2025-03-19	Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study	Jomar Thomas Almonte et.al.	2503.15248	null
2025-03-19	CoE: Chain-of-Explanation via Automatic Visual Concept Circuit Description and Polysemanticity Quantification	Wenlong Yu et.al.	2503.15234	link
2025-03-19	Context-Aware Vision Language Foundation Models for Ocular Disease Screening in Retinal Images	Lucie Berger et.al.	2503.15212	null
2025-03-18	Aligning Multimodal LLM with Human Preference: A Survey	Tao Yu et.al.	2503.14504	link
2025-03-18	The Power of Context: How Multimodality Improves Image Super-Resolution	Kangfu Mei et.al.	2503.14503	null
2025-03-18	Tracking Meets Large Multimodal Models for Driving Scenario Understanding	Ayesha Ishaq et.al.	2503.14498	link
2025-03-18	Gricean Norms as a Basis for Effective Collaboration	Fardin Saad et.al.	2503.14484	link
2025-03-18	ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing	Yulin Pan et.al.	2503.14482	null
2025-03-18	LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers	Nikhil Abhyankar et.al.	2503.14434	link
2025-03-18	MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation	Hongyu Zhang et.al.	2503.14428	null
2025-03-18	Large Language Models for Virtual Human Gesture Selection	Parisa Ghanad Torshizi et.al.	2503.14408	null
2025-03-18	Impossible Videos	Zechen Bai et.al.	2503.14378	null
2025-03-18	RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment	Chao Wang et.al.	2503.14358	null
2025-03-17	Faithfulness of LLM Self-Explanations for Commonsense Tasks: Larger Is Better, and Instruction-Tuning Allows Trade-Offs but Not Pareto Dominance	Noah Y. Siegel et.al.	2503.13445	null
2025-03-17	VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning	Ye Liu et.al.	2503.13444	link
2025-03-17	DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models	Haoyang Li et.al.	2503.13443	link
2025-03-18	MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling	Yingyue Li et.al.	2503.13440	link
2025-03-18	DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective	Dengyun Peng et.al.	2503.13413	link
2025-03-17	MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research	James Burgess et.al.	2503.13399	link
2025-03-17	Aligned Probing: Relating Toxic Behavior and Model Internals	Andreas Waldis et.al.	2503.13390	null
2025-03-17	Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning	Hai-Long Sun et.al.	2503.13360	null
2025-03-17	LEAVS: An LLM-based Labeler for Abdominal CT Supervision	Ricardo Bigolin Lanfredi et.al.	2503.13330	link
2025-03-17	Edit Transfer: Learning Image Editing via Vision In-Context Relations	Lan Chen et.al.	2503.13327	null
2025-03-14	RNN-DAS: A New Deep Learning Approach for Detection and Real-Time Monitoring of Volcano-Tectonic Events Using Distributed Acoustic Sensing	Javier Fernandez-Carabantes et.al.	2503.11622	null
2025-03-14	Synthesizing Access Control Policies using Large Language Models	Adarsh Vatsa et.al.	2503.11573	null
2025-03-14	Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models	Hao Cheng et.al.	2503.11519	null
2025-03-14	Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks	Diego Gosmar et.al.	2503.11517	link
2025-03-14	T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation	Seyed Mohammad Hadi Hosseini et.al.	2503.11481	null
2025-03-14	Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models	Xu Liu et.al.	2503.11411	null
2025-03-14	Optimizing Large Language Models for Detecting Symptoms of Comorbid Depression or Anxiety in Chronic Diseases: Insights from Patient Messages	Jiyeong Kim et.al.	2503.11384	null
2025-03-14	Modeling Subjectivity in Cognitive Appraisal with Language Models	Yuxiang Zhou et.al.	2503.11381	null
2025-03-14	Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model	Moritz A. Zanger et.al.	2503.11339	null
2025-03-14	AI-Assisted Object Condensation Clustering for Calorimeter Shower Reconstruction at CLAS12	Gregory Matousek et.al.	2503.11277	null
2025-03-13	GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing	Rongyao Fang et.al.	2503.10639	link
2025-03-14	Distilling Diversity and Control in Diffusion Models	Rohit Gandikota et.al.	2503.10637	null
2025-03-13	V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes	Yanming Zhang et.al.	2503.10634	null
2025-03-13	Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search	Andy Zhou et.al.	2503.10619	null
2025-03-13	Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models	Andy Zhou et.al.	2503.10617	null
2025-03-13	ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer	Bolin Chen et.al.	2503.10614	null
2025-03-13	Unlock the Power of Unlabeled Data in Language Driving Model	Chaoqun Wang et.al.	2503.10586	null
2025-03-13	ASIDE: Architectural Separation of Instructions and Data in Language Models	Egor Zverev et.al.	2503.10566	null
2025-03-13	MASQUE: A Text-Guided Diffusion-Based Framework for Localized and Customized Adversarial Makeup	Youngjin Kwon et.al.	2503.10549	null
2025-03-13	KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation	Zixian Liu et.al.	2503.10546	null
2025-03-12	MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System	Jihao Zhao et.al.	2503.09600	link
2025-03-12	Auspex: Building Threat Modeling Tradecraft into an Artificial Intelligence-based Copilot	Andrew Crossman et.al.	2503.09586	null
2025-03-12	Evolution of the Three Spectral Components in the Prompt Emission of GRB 240825A	Chen-Wei Wang et.al.	2503.09562	null
2025-03-12	Contextuality sans incompatibility in the simplest scenario: Communication supremacy of a qubit	Partha Patra et.al.	2503.09534	null
2025-03-12	Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning	Bowen Jin et.al.	2503.09516	link
2025-03-12	Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection	Romain Thoreau et.al.	2503.09493	null
2025-03-12	SurgicalVLM-Agent: Towards an Interactive AI Co-Pilot for Pituitary Surgery	Jiayuan Huang et.al.	2503.09474	null
2025-03-12	Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models	Zhihua Tian et.al.	2503.09446	link
2025-03-12	SuperCarver: Texture-Consistent 3D Geometry Super-Resolution for High-Fidelity Surface Detail Generation	Qijian Zhang et.al.	2503.09439	null
2025-03-12	PromptMap: An Alternative Interaction Style for AI-Based Image Generation	Krzysztof Adamkiewicz et.al.	2503.09436	link
2025-03-11	Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs	Ariba Khan et.al.	2503.08688	link
2025-03-11	OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models	Jialv Zou et.al.	2503.08686	link
2025-03-11	Chain-of-Thought Reasoning In The Wild Is Not Always Faithful	Iván Arcuschin et.al.	2503.08679	link
2025-03-11	AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence	Zekun Li et.al.	2503.08669	null
2025-03-11	Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling	Subin Kim et.al.	2503.08605	null
2025-03-11	NSF-SciFy: Mining the NSF Awards Database for Scientific Claims	Delip Rao et.al.	2503.08600	null
2025-03-11	There’s more to life in reflected light: Simulating the detectability of a range of molecules for high-contrast, high-resolution observations of non-transiting terrestrial exoplanets	Miles H. Currie et.al.	2503.08592	null
2025-03-11	BiasEdit: Debiasing Stereotyped Language Models via Model Editing	Xin Xu et.al.	2503.08588	link
2025-03-11	Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation	Mingkang Zhu et.al.	2503.08575	null
2025-03-11	ComicsPAP: understanding comic strips by picking the correct panel	Emanuele Vivoli et.al.	2503.08561	null
2025-03-10	GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval	Justus-Jonas Erker et.al.	2503.07519	link
2025-03-10	TokenButler: Token Importance is Predictable	Yash Akhauri et.al.	2503.07518	link
2025-03-10	CPAny: Couple With Any Encoder to Refer Multi-Object Tracking	Weize Li et.al.	2503.07516	null
2025-03-10	Language Models Fail to Introspect About Their Knowledge of Language	Siyuan Song et.al.	2503.07513	link
2025-03-10	Plume: Scaffolding Text Composition in Dashboards	Maxim Lisnic et.al.	2503.07512	null
2025-03-10	Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts	Shiu-hong Kao et.al.	2503.07503	null
2025-03-10	V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation	Guiwei Zhang et.al.	2503.07493	link
2025-03-10	Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction	Zongzheng Zhang et.al.	2503.07485	link
2025-03-10	YOLOE: Real-Time Seeing Anything	Ao Wang et.al.	2503.07465	link
2025-03-10	Anatomy-Aware Conditional Image-Text Retrieval	Meng Zheng et.al.	2503.07456	null
2025-03-10	From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics	Jaewook Lee et.al.	2503.07429	null
2025-03-10	TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision	Shaobin Zhuang et.al.	2503.07416	null
2025-03-10	REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding	Yan Tai et.al.	2503.07413	link
2025-03-10	TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models	Ruidong Chen et.al.	2503.07389	link
2025-03-10	Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment	Xing Xie et.al.	2503.07334	link
2025-03-10	Self-Corrective Task Planning by Inverse Prompting with Large Language Models	Jiho Lee et.al.	2503.07317	null
2025-03-10	Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies	Luyi Jiang et.al.	2503.07306	null
2025-03-07	Fairness-Aware Low-Rank Adaptation Under Demographic Privacy Constraints	Parameswaran Kamalaruban et.al.	2503.05684	null
2025-03-07	Task-oriented Uncertainty Collaborative Learning for Label-Efficient Brain Tumor Segmentation	Zhenxuan Zhang et.al.	2503.05682	link
2025-03-07	AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data	Zengqun Zhao et.al.	2503.05665	link
2025-03-07	VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control	Yuxuan Bian et.al.	2503.05639	link
2025-03-07	Nuanced Safety for Generative AI: How Demographics Shape Responsiveness to Severity	Pushkar Mishra et.al.	2503.05609	null
2025-03-07	Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models	Zheng Li et.al.	2503.05595	link
2025-03-07	Evaluating open-source Large Language Models for automated fact-checking	Nicolo’ Fontana et.al.	2503.05565	null
2025-03-07	S4M: Segment Anything with 4 Extreme Points	Adrien Meyer et.al.	2503.05534	null
2025-03-07	State-of-the-Art Stroke Lesion Segmentation at 1/1000th of Parameters	Alex Fedorov et.al.	2503.05531	null
2025-03-07	Cognitive Bias Detection Using Advanced Prompt Engineering	Frederic Lemieux et.al.	2503.05516	null
2025-03-07	Shifting Long-Context LLMs Research from Input to Output	Yuhao Wu et.al.	2503.04723	null
2025-03-06	Enough Coin Flips Can Make LLMs Act Bayesian	Ritwik Gupta et.al.	2503.04722	null
2025-03-06	Scaling Rich Style-Prompted Text-to-Speech Datasets	Anuj Diwan et.al.	2503.04713	link
2025-03-06	L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning	Pranjal Aggarwal et.al.	2503.04697	null
2025-03-06	Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation	Aishik Konwer et.al.	2503.04639	null
2025-03-06	SynGraph: A Dynamic Graph-LLM Synthesis Framework for Sparse Streaming User Sentiment Modeling	Xin Zhang et.al.	2503.04619	null
2025-03-06	Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation	Armel Zebaze et.al.	2503.04554	null
2025-03-06	Generalized Interpolating Discrete Diffusion	Dimitri von Rütte et.al.	2503.04482	link
2025-03-06	ToolFuzz – Automated Agent Tool Testing	Ivan Milev et.al.	2503.04479	null
2025-03-06	Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges	Francisco Eiras et.al.	2503.04474	null
2025-03-05	A Practical Memory Injection Attack against LLM Agents	Shen Dong et.al.	2503.03704	null
2025-03-05	A Generative Approach to High Fidelity 3D Reconstruction from Text Data	Venkat Kumar R et.al.	2503.03664	null
2025-03-05	LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant	Wei Li et.al.	2503.03663	null
2025-03-05	Improving Neutral Point of View Text Generation through Parameter-Efficient Reinforcement Learning and a Small-Scale High-Quality Dataset	Jessica Hoffmann et.al.	2503.03654	null
2025-03-05	Token-Level Privacy in Large Language Models	Re’em Harel et.al.	2503.03652	null
2025-03-05	DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms	Xiaojun Bi et.al.	2503.03644	link
2025-03-05	Enhancing the Accuracy and Comprehensibility in Architectural Tactics Detection via Small Model-Augmented Prompt Engineering	Lingli Cao et.al.	2503.03609	link
2025-03-05	Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders	Kristian Kuznetsov et.al.	2503.03601	null
2025-03-05	Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs	Haoran Fan et.al.	2503.03594	link
2025-03-05	Digital Twin-Enabled Blockage-Aware Dynamic mmWave Multi-Hop V2X Communication	Supat Roongpraiwan et.al.	2503.03590	null
2025-03-04	Prompting Generative AI with Interaction-Augmented Instructions	Leixian Shen et.al.	2503.02874	null
2025-03-04	Calibrating LLM Confidence with Semantic Steering: A Multi-Prompt Aggregation Framework	Ziang Zhou et.al.	2503.02863	null
2025-03-04	Evaluation of Architectural Synthesis Using Generative AI	Jingfei Huang et.al.	2503.02861	null
2025-03-04	A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness	Nathan Drenkow et.al.	2503.02797	null
2025-03-04	Quantitative Resilience Modeling for Autonomous Cyber Defense	Xavier Cadet et.al.	2503.02780	null
2025-03-04	Prime Convolutional Model: Breaking the Ground for Theoretical Explainability	Francesco Panelli et.al.	2503.02773	null
2025-03-04	From Metaphor to Mechanism: How LLMs Decode Traditional Chinese Medicine Symbolic Language for Modern Clinical Relevance	Jiacheng Tang et.al.	2503.02760	null
2025-03-04	BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt Compression	Daniil Larionov et.al.	2503.02756	null
2025-03-04	Evaluating Knowledge Generation and Self-Refinement Strategies for LLM-based Column Type Annotation	Keti Korini et.al.	2503.02718	link
2025-03-04	FlowPlan: Zero-Shot Task Planning with LLM Flow Engineering for Robotic Instruction Following	Zijun Lin et.al.	2503.02698	null
2025-02-28	Persuasion Should be Double-Blind: A Multi-Domain Dialogue Dataset With Faithfulness Based on Causal Theory of Mind	Dingyi Zhang et.al.	2502.21297	null
2025-02-28	Contextualizing biological perturbation experiments through language	Menghua Wu et.al.	2502.21290	link
2025-02-28	Adaptive Keyframe Sampling for Long Video Understanding	Xi Tang et.al.	2502.21271	null
2025-02-28	RuCCoD: Towards Automated ICD Coding in Russian	Aleksandr Nesterov et.al.	2502.21263	link
2025-02-28	PET Image Denoising via Text-Guided Diffusion: Integrating Anatomical Priors through Text Prompts	Boxiao Yu et.al.	2502.21260	null
2025-02-28	Towards Developing Ethical Reasoners: Integrating Probabilistic Reasoning and Decision-Making for Complex AI Systems	Nijesh Upreti et.al.	2502.21250	null
2025-02-28	Brickify: Enabling Expressive Design Intent Specification through Direct Manipulation on Design Tokens	Xinyu Shi et.al.	2502.21219	null
2025-02-28	Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought	Jianhao Huang et.al.	2502.21212	null
2025-02-28	CuPID: Leveraging Masked Single-Lead ECG Modelling for Enhancing the Representations	Adtian Atienza et.al.	2502.21127	null
2025-02-28	SEE: See Everything Every Time – Adaptive Brightness Adjustment for Broad Light Range Images via Events	Yunfan Lu et.al.	2502.21120	null
2025-02-27	Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation	Sucheng Ren et.al.	2502.20388	link
2025-02-27	Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis	Jeffrey Yang Fan Chiang et.al.	2502.20383	null
2025-02-27	Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers	Shalev Lifshitz et.al.	2502.20379	null
2025-02-27	Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization	Ryan C. Barron et.al.	2502.20364	link
2025-02-27	Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs	Kuan Lok Zhou et.al.	2502.20356	null
2025-02-27	On Adversarial Attacks In Acoustic Drone Localization	Tamir Shor et.al.	2502.20325	null
2025-02-27	ACCORD: Application Context-aware Cross-layer Optimization and Resource Design for 5G/NextG Machine-centric Applications	Azuka Chiejina et.al.	2502.20320	null
2025-02-27	LangProBe: a Language Programs Benchmark	Shangyin Tan et.al.	2502.20315	null
2025-02-27	Mobius: Text to Seamless Looping Video Generation via Latent Shift	Xiuli Bi et.al.	2502.20307	link
2025-02-27	Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription	Benjamin Gutteridge et.al.	2502.20295	link
2025-02-26	Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models	Lucy Xiaoyang Shi et.al.	2502.19417	null
2025-02-26	Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing	Akshat Gupta et.al.	2502.19416	null
2025-02-26	The Mighty ToRR: A Benchmark for Table Reasoning and Robustness	Shir Ashury-Tahan et.al.	2502.19412	link
2025-02-26	Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices	Xinru Wang et.al.	2502.19410	null
2025-02-26	DataMan: Data Manager for Pre-training Large Language Models	Ru Peng et.al.	2502.19363	null
2025-02-26	Optimal COVID-19 vaccine prioritization by age depends critically on inter-group contacts and vaccination rates	Iker Atienza-Diez et.al.	2502.19292	null
2025-02-26	CritiQ: Mining Data Quality Criteria from Human Preferences	Honglin Guo et.al.	2502.19279	null
2025-02-26	Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in Pre-trained Vision-Language Models	Jiawei Kong et.al.	2502.19269	null
2025-02-26	Enhancing Gradient-based Discrete Sampling via Parallel Tempering	Luxu Liang et.al.	2502.19240	null
2025-02-26	AI-Powered Bayesian Inference	Veronika Ročková et.al.	2502.19231	null
2025-02-25	K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs	Ziheng Ouyang et.al.	2502.18461	null
2025-02-25	Evaluating the Effectiveness of Small Language Models in Detecting Refactoring Bugs	Rohit Gheyi et.al.	2502.18454	null
2025-02-25	MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning	Chanwoo Park et.al.	2502.18439	null
2025-02-25	Rank1: Test-Time Compute for Reranking in Information Retrieval	Orion Weller et.al.	2502.18418	link
2025-02-25	MedKAN: An Advanced Kolmogorov-Arnold Network for Medical Image Classification	Zhuoqin Yang et.al.	2502.18416	null
2025-02-25	ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation	Yifan Pu et.al.	2502.18364	null
2025-02-25	GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music	Xinran Liu et.al.	2502.18309	null
2025-02-25	LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation	Pengzhi Li et.al.	2502.18302	null
2025-02-25	Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training	Botao Ye et.al.	2502.18219	null
2025-02-25	FLARE: A Framework for Stellar Flare Forecasting using Stellar Physical Properties and Historical Records	Bingke Zhu et.al.	2502.18218	null
2025-02-24	Stronger Neyman Regret Guarantees for Adaptive Experimental Design	Georgy Noarov et.al.	2502.17427	link
2025-02-24	Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs	Jan Betley et.al.	2502.17424	link
2025-02-24	Function-Space Learning Rates	Edward Milsom et.al.	2502.17405	link
2025-02-24	What is a Good Question? Utility Estimation with LLM-based Simulations	Dong-Ho Lee et.al.	2502.17383	null
2025-02-24	A Closer Look at TabPFN v2: Strength, Limitation, and Extension	Han-Jia Ye et.al.	2502.17361	null
2025-02-24	Goal-Oriented Middleware Filtering at Transport Layer Based on Value of Updates	Polina Kutsevol et.al.	2502.17350	null
2025-02-24	Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents	Prafulla Kumar Choubey et.al.	2502.17321	null
2025-02-24	A novel approach to navigate the taxonomic hierarchy to address the Open-World Scenarios in Medicinal Plant Classification	Soumen Sinha et.al.	2502.17289	null
2025-02-24	Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing	Yi-Kai Zhang et.al.	2502.17282	link
2025-02-24	Extracting domain-specific terms using contextual word embeddings	Andraž Repar et.al.	2502.17278	null
2025-02-21	ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval	Guanqi Zhan et.al.	2502.15682	null
2025-02-21	AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind	Zhining Zhang et.al.	2502.15676	link
2025-02-21	Empowering LLMs with Logical Reasoning: A Comprehensive Survey	Fengxiang Cheng et.al.	2502.15652	null
2025-02-21	MemoryPods: Enhancing Asynchronous Communication in Extended Reality	Akos Nagy et.al.	2502.15622	null
2025-02-21	Extraction multi-étiquettes de relations en utilisant des couches de Transformer	Ngoc Luyen Le et.al.	2502.15619	null
2025-02-21	Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author’s Style	Xueran Han et.al.	2502.15616	null
2025-02-21	Ontological models cannot adequately represent state update for sequential measurement of incompatible observables	Alisson Tezzin et.al.	2502.15615	null
2025-02-21	Chats-Grid: An Iterative Retrieval Q&A Optimization Scheme Leveraging Large Model and Retrieval Enhancement Generation in smart grid	Yunfeng Li et.al.	2502.15583	null
2025-02-21	Context-Aware Doubly-Robust Semi-Supervised Learning	Clement Ruah et.al.	2502.15577	null
2025-02-21	A Cautionary Tale About “Neutrally” Informative AI Tools Ahead of the 2025 Federal Elections in Germany	Ina Dormuth et.al.	2502.15568	null
2025-02-20	Prompt-to-Leaderboard	Evan Frick et.al.	2502.14855	link
2025-02-20	Red-Teaming LLM Multi-Agent Systems via Communication Attacks	Pengfei He et.al.	2502.14847	null
2025-02-20	Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation	Yue Yang et.al.	2502.14846	null
2025-02-20	Dynamic Concepts Personalization from Single Videos	Rameen Abdal et.al.	2502.14844	null
2025-02-20	Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps	Martin Tutek et.al.	2502.14829	link
2025-02-20	eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables	Luis Antonio Gutiérrez Guanilo et.al.	2502.14820	null
2025-02-20	Dynamic Low-Rank Sparse Adaptation for Large Language Models	Weizhong Huang et.al.	2502.14816	link
2025-02-20	Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration	Pengxiang Ding et.al.	2502.14795	null
2025-02-20	Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning	Tian Xie et.al.	2502.14768	link
2025-02-20	HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States	Yilei Jiang et.al.	2502.14744	link
2025-02-19	Where’s the Bug? Attention Probing for Scalable Fault Localization	Adam Stein et.al.	2502.13966	null
2025-02-19	RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision	Guangzhi Xiong et.al.	2502.13957	null
2025-02-19	Neurosymbolic artificial intelligence via large language models and coherence-driven inference	Steve Huntsman et.al.	2502.13953	null
2025-02-19	A Chain-of-Thought Subspace Meta-Learning for Few-shot Image Captioning with Large Vision and Language Models	Hao Huang et.al.	2502.13942	null
2025-02-19	Citation proximus: the role of social and semantic ties in citing behaviour	Diego Kozlowski et.al.	2502.13934	null
2025-02-19	Beyond Single Frames: Can LMMs Comprehend Temporal and Contextual Narratives in Image Sequences?	Xiaochen Wang et.al.	2502.13925	null
2025-02-19	Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis	Jiahao Gai et.al.	2502.13921	null
2025-02-19	Exploring Personalized Health Support through Data-Driven, Theory-Guided LLMs: A Case Study in Sleep Health	Xingbo Wang et.al.	2502.13920	link
2025-02-19	Judging the Judges: A Collection of LLM-Generated Relevance Judgements	Hossein A. Rahmani et.al.	2502.13908	link
2025-02-19	DataSciBench: An LLM Agent Benchmark for Data Science	Dan Zhang et.al.	2502.13897	link
2025-02-18	UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models	Huawei Lin et.al.	2502.13141	link
2025-02-18	Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions	Taedong Yun et.al.	2502.13135	null
2025-02-18	STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models	Narun Raman et.al.	2502.13119	null
2025-02-18	Near-Optimal Private Learning in Linear Contextual Bandits	Fan Chen et.al.	2502.13115	null
2025-02-18	KAPPA: A Generic Patent Analysis Framework with Keyphrase-Based Portraits	Xin Xia et.al.	2502.13076	null
2025-02-18	Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction	Nils Constantin Hellwig et.al.	2502.13044	null
2025-02-18	HPSS: Heuristic Prompting Strategy Search for LLM Evaluators	Bosi Wen et.al.	2502.13031	null
2025-02-18	Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks	Markus J. Buehler et.al.	2502.13025	link
2025-02-18	Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation	Sha Li et.al.	2502.13019	null
2025-02-18	LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation	Junchen Fu et.al.	2502.12945	null
2025-02-17	Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA	Patryk Marszałek et.al.	2502.12122	link
2025-02-17	A-MEM: Agentic Memory for LLM Agents	Wujiang Xu et.al.	2502.12110	link
2025-02-17	VLM $^2$ -Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues	Jianshu Zhang et.al.	2502.12084	null
2025-02-17	Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation	Zhongyi Qiu et.al.	2502.12073	null
2025-02-17	Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions	Lan Zhang et.al.	2502.12065	link
2025-02-17	Designing Role Vectors to Improve LLM Inference Behaviour	Daniele Potertì et.al.	2502.12055	null
2025-02-17	Robotic CBCT Meets Robotic Ultrasound	Feng Li et.al.	2502.12019	null
2025-02-17	Learning Generalizable Prompt for CLIP with Class Similarity Knowledge	Sehun Jung et.al.	2502.11969	null
2025-02-17	VAQUUM: Are Vague Quantifiers Grounded in Visual Data?	Hugh Mee Wong et.al.	2502.11874	null
2025-02-17	Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu	Renhao Pei et.al.	2502.11862	link
2025-02-14	Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction	WonJin Yoon et.al.	2502.10388	null
2025-02-14	Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models	Jiexin Ding et.al.	2502.10378	null
2025-02-14	Adversarial Mixup Unlearning	Zhuoyi Peng et.al.	2502.10288	null
2025-02-14	Are Large Language Models the future crowd workers of Linguistics?	Iris Ferrazzo et.al.	2502.10266	null
2025-02-14	VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models	Gokul Karthik Kumar et.al.	2502.10250	null
2025-02-14	Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model	Guoqing Ma et.al.	2502.10248	link
2025-02-14	Combinatorial Reinforcement Learning with Preference Feedback	Joongkyu Lee et.al.	2502.10158	null
2025-02-14	NeuroXVocal: Detection and Explanation of Alzheimer’s Disease through Non-invasive Analysis of Picture-prompted Speech	Nikolaos Ntampakis et.al.	2502.10108	null
2025-02-14	MTLM: an Innovative Language Model Training Paradigm for ASR	Qingliang Meng et.al.	2502.10058	null
2025-02-14	ViRAC: A Vision-Reasoning Agent Head Movement Control Framework in Arbitrary Virtual Environments	Juyeong Hwang et.al.	2502.10046	null
2025-02-13	MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency	Dongzhi Jiang et.al.	2502.09621	null
2025-02-13	Designing a Conditional Prior Distribution for Flow-Based Generative Models	Noam Issachar et.al.	2502.09611	null
2025-02-13	CoT-Valve: Length-Compressible Chain-of-Thought Tuning	Xinyin Ma et.al.	2502.09601	link
2025-02-13	GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis	Angelos Zavras et.al.	2502.09598	link
2025-02-13	Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs	Siyan Zhao et.al.	2502.09597	link
2025-02-13	Polymind: Parallel Visual Diagramming with Large Language Models to Support Prewriting Through Microtasks	Qian Wan et.al.	2502.09577	null
2025-02-13	Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering	Mark Beliaev et.al.	2502.09573	null
2025-02-13	MDCrow: Automating Molecular Dynamics Workflows with Large Language Models	Quintina Campbell et.al.	2502.09565	link
2025-02-13	Improve LLM-based Automatic Essay Scoring with Linguistic Features	Zhaoyi Joey Hou et.al.	2502.09497	null
2025-02-13	Objective quantification of mood states using large language models	Jakub Onysk et.al.	2502.09487	null
2025-02-12	Rhythmic sharing: A bio-inspired paradigm for zero-shot adaptation and learning in neural networks	Hoony Kang et.al.	2502.08644	link
2025-02-12	Ultrasound Image Generation using Latent Diffusion Models	Benoit Freiche et.al.	2502.08580	null
2025-02-12	AR Glulam: Accurate Augmented Reality Using Multiple Fiducial Markers for Glulam Fabrication	Alexander Htet Kyaw et.al.	2502.08566	null
2025-02-12	QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval	Wonduk Seo et.al.	2502.08557	null
2025-02-12	LLMs can implicitly learn from mistakes in-context	Lisa Alazraki et.al.	2502.08550	null
2025-02-12	LoRa Fine Synchronization with Two-Pass Time and Frequency Offset Estimation	Joachim Tapparel et.al.	2502.08485	null
2025-02-12	Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning	Qifan Yu et.al.	2502.08482	null
2025-02-12	Towards Prompt Generalization: Grammar-aware Cross-Prompt Automated Essay Scoring	Heejin Do et.al.	2502.08450	null
2025-02-12	A Semantic Parsing Algorithm to Solve Linear Ordering Problems	Maha Alkhairy et.al.	2502.08415	null
2025-02-12	IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance	Paul Röttger et.al.	2502.08395	null
2025-02-11	Auditing Prompt Caching in Language Model APIs	Chenchen Gu et.al.	2502.07776	link
2025-02-11	Great Power Brings Great Responsibility: Personalizing Conversational AI for Diverse Problem-Solvers	Italo Santos et.al.	2502.07763	null
2025-02-11	An Advanced NLP Framework for Automated Medical Diagnosis with DeBERTa and Dynamic Contextual Positional Gating	Mohammad Ali Labbaf Khaniki et.al.	2502.07755	null
2025-02-11	WHODUNIT: Evaluation benchmark for culprit detection in mystery stories	Kshitij Gupta et.al.	2502.07747	link
2025-02-11	HRP: High-Rank Preheating for Superior LoRA Initialization	Yuzhu Chen et.al.	2502.07739	null
2025-02-11	Pluto: Authoring Semantically Aligned Text and Charts for Data-Driven Communication	Arjun Srinivasan et.al.	2502.07725	null
2025-02-11	RenderBox: Expressive Performance Rendering with Text Control	Huan Zhang et.al.	2502.07711	null
2025-02-11	Methodology for Identifying Social Groups within a Transactional Graph	Maxence Morin et.al.	2502.07694	null
2025-02-11	Are Princelings Truly Busted? Evaluating Transaction Discounts in China’s Land Market	Julia Manso et.al.	2502.07692	null
2025-02-11	exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem	Sajad Ebrahimi et.al.	2502.07683	link
2025-02-10	Rationalization Models for Text-to-SQL	Gaetano Rossiello et.al.	2502.06759	null
2025-02-10	SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement	Yuqi Lin et.al.	2502.06756	link
2025-02-10	Discovery of skill switching criteria for learning agile quadruped locomotion	Wanming Yu et.al.	2502.06676	null
2025-02-10	Boosting Self-Efficacy and Performance of Large Language Models via Verbal Efficacy Stimulations	Rui Chen et.al.	2502.06669	null
2025-02-10	In-Context Learning (and Unlearning) of Length Biases	Stephanie Schoch et.al.	2502.06653	null
2025-02-10	Estimation of Food Intake Quantity Using Inertial Signals from Smartwatches	Ioannis Levi et.al.	2502.06649	null
2025-02-10	Quasi-stationary distributions for subcritical population models	Pablo Groisman et.al.	2502.06638	null
2025-02-10	Unleashing the Potential of Pre-Trained Diffusion Models for Generalizable Person Re-Identification	Jiachen Li et.al.	2502.06619	link
2025-02-10	A Large-scale AI-generated Image Inpainting Benchmark	Paschalis Giakoumoglou et.al.	2502.06593	null
2025-02-10	Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training	Yuchen Zhuang et.al.	2502.06589	null
2025-02-07	FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation	Shilong Zhang et.al.	2502.05179	link
2025-02-07	MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison	Kaijie Zhu et.al.	2502.05174	link
2025-02-07	In-context denoising with one-layer transformers: connections between attention and associative memory retrieval	Matthew Smart et.al.	2502.05164	null
2025-02-07	CodeSCM: Causal Analysis for Multi-Modal Code Generation	Mukur Gupta et.al.	2502.05150	link
2025-02-07	From Restless to Contextual: A Thresholding Bandit Approach to Improve Finite-horizon Performance	Jiamin Xu et.al.	2502.05145	link
2025-02-07	Segment Geometry Optimization and Prototype Studies of a Multi-Coincidence GAGG Solar Neutrino Detector	Brooks Hartsock et.al.	2502.05095	null
2025-02-07	Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs	Thierry Bossy et.al.	2502.05087	link
2025-02-07	ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework	Xiaoyu Deng et.al.	2502.05084	null
2025-02-07	Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures	Tushar Pandey et.al.	2502.05078	link
2025-02-07	Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images	Aditya Kumar et.al.	2502.05066	link
2025-02-06	ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features	Alec Helbling et.al.	2502.04320	link
2025-02-06	ChamaleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters	Kamer Ali Yuksel et.al.	2502.04315	link
2025-02-06	DexterityGen: Foundation Controller for Unprecedented Dexterity	Zhao-Heng Yin et.al.	2502.04307	null
2025-02-06	Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization	Yuanye Liu et.al.	2502.04295	link
2025-02-06	GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation	Weihang Li et.al.	2502.04293	null
2025-02-06	Cognitive AI framework: advances in the simulation of human thought	Rommel Salas-Guerra et.al.	2502.04259	null
2025-02-06	MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion	Xintong Hao et.al.	2502.04235	null
2025-02-06	Sports and Women’s Sports: Gender Bias in Text Generation with Olympic Data	Laura Biester et.al.	2502.04218	null
2025-02-06	“Short-length” Adversarial Training Helps LLMs Defend “Long-length” Jailbreak Attacks: Theoretical and Empirical Evidence	Shaopeng Fu et.al.	2502.04204	link
2025-02-06	Lexical Substitution is not Synonym Substitution: On the Importance of Producing Contextually Relevant Word Substitutes	Juraj Vladika et.al.	2502.04173	null
2025-02-05	Contextuality with Pauli observables in cycle scenarios	Raman Choudhary et.al.	2502.03451	null
2025-02-05	A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs)	Yiye Chen et.al.	2502.03450	null
2025-02-05	Can Text-to-Image Generative Models Accurately Depict Age? A Comparative Study on Synthetic Portrait Generation and Age Estimation	Alexey A. Novikov et.al.	2502.03420	null
2025-02-05	Think or Step-by-Step? UnZIPping the Black Box in Zero-Shot Prompts	Nikta Gohari Sadr et.al.	2502.03418	null
2025-02-05	Energy-Efficient Flying LoRa Gateways: A Multi-Agent Reinforcement Learning Approach	Abdullahi Isa Ahmed et.al.	2502.03377	null
2025-02-05	Interactive Visualization Recommendation with Hier-SUCB	Songwen Hu et.al.	2502.03375	link
2025-02-05	Controllable GUI Exploration	Aryan Garg et.al.	2502.03330	null
2025-02-05	ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model	Qiguang Chen et.al.	2502.03325	null
2025-02-05	ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models	Ying Zhang et.al.	2502.03266	link
2025-02-05	MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent	Xinyao Liao et.al.	2502.03207	null
2025-02-04	Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling	Xiaowen Qiu et.al.	2502.02590	null
2025-02-04	Contextuality of Quantum Error-Correcting Codes	Derek Khu et.al.	2502.02553	null
2025-02-04	OVERTHINKING: Slowdown Attacks on Reasoning LLMs	Abhinav Kumar et.al.	2502.02542	link
2025-02-04	Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies	Han Zhou et.al.	2502.02533	null
2025-02-04	Catoni Contextual Bandits are Robust to Heavy-tailed Rewards	Chenlu Ye et.al.	2502.02486	null
2025-02-04	An extended Wigner’s friend no-go theorem inspired by generalized contextuality	Laurens Walleghem et.al.	2502.02461	null
2025-02-04	IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning	Quan Zhang et.al.	2502.02454	null
2025-02-04	Personalization Toolkit: Training Free Personalization of Large Vision Language Models	Soroush Seifi et.al.	2502.02452	null
2025-02-04	LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models	Jiangong Chen et.al.	2502.02441	link
2025-02-04	FewTopNER: Integrating Few-Shot Learning with Topic Modeling and Named Entity Recognition in a Multilingual Framework	Ibrahim Bouabdallaoui et.al.	2502.02391	link
2025-01-31	Low-Rank Adapting Models for Sparse Autoencoders	Matthew Chen et.al.	2501.19406	link
2025-01-31	Vintix: Action Model via In-Context Reinforcement Learning	Andrey Polubarov et.al.	2501.19400	link
2025-01-31	Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models	Wenzhi Fang et.al.	2501.19389	link
2025-01-31	The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking	Yuchun Miao et.al.	2501.19358	null
2025-01-31	LLM-based Affective Text Generation Quality Based on Different Quantization Values	Yarik Menchaca Resendiz et.al.	2501.19317	null
2025-01-31	Reverse Probing: Evaluating Knowledge Transfer via Finetuned Task Embeddings for Coreference Resolution	Tatiana Anikina et.al.	2501.19316	null
2025-01-31	Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes	Zhiyao Xu et.al.	2501.19298	null
2025-01-31	Analysis of LLMs vs Human Experts in Requirements Engineering	Cory Hymel et.al.	2501.19297	null
2025-01-31	Differentially Private In-context Learning via Sampling Few-shot Mixed with Zero-shot Outputs	James Flemings et.al.	2501.19287	null
2025-01-31	Pheromone-based Learning of Optimal Reasoning Paths	Anirudh Chari et.al.	2501.19278	null
2025-01-30	R.I.P.: Better Models by Survival of the Fittest Prompts	Ping Yu et.al.	2501.18578	null
2025-01-30	BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos	Lehao Lin et.al.	2501.18565	null
2025-01-30	Semantic Web and Creative AI – A Technical Report from ISWS 2023	Raia Abu Ahmad et.al.	2501.18542	null
2025-01-30	Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges	Manveer Singh Tamber et.al.	2501.18536	link
2025-01-30	CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction	Peter J. Bentley et.al.	2501.18504	null
2025-01-30	HSRMamba: Contextual Spatial-Spectral State Space Model for Single Hyperspectral Super-Resolution	Shi Chen et.al.	2501.18500	link
2025-01-30	CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization	Yanxia Deng et.al.	2501.18475	null
2025-01-30	Tuning Vision Foundation Model via Test-Time Prompt-Guided Training for VFSS Segmentations	Chengxi Zeng et.al.	2501.18474	null
2025-01-30	ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation	Minghua He et.al.	2501.18460	null
2025-01-30	o3-mini vs DeepSeek-R1: Which One is Safer?	Aitor Arrieta et.al.	2501.18438	link
2025-01-29	Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs’ Domain-Specific Insight Learning?	Pouya Pezeshkpour et.al.	2501.17840	link
2025-01-29	U2A: Unified Unimodal Adaptation for Robust and Efficient Multimodal Learning	Md Kaykobad Reza et.al.	2501.17823	null
2025-01-29	Leveraging Multimodal LLM for Inspirational User Interface Search	Seokhyeon Park et.al.	2501.17799	link
2025-01-29	AdditiveLLM: Large Language Models Predict Defects in Additive Manufacturing	Peter Pak et.al.	2501.17784	null
2025-01-29	Unraveling Log4Shell: Analyzing the Impact and Response to the Log4j Vulnerabil	John Doll et.al.	2501.17760	null
2025-01-29	Early External Safety Testing of OpenAI’s o3-mini: Insights from the Pre-Deployment Evaluation	Aitor Arrieta et.al.	2501.17749	null
2025-01-29	VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback	Sayeh Gholipour Picha et.al.	2501.17726	null
2025-01-29	RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts	Eujeong Choi et.al.	2501.17715	link
2025-01-29	In-Context Meta LoRA Generation	Yihua Shao et.al.	2501.17635	null
2025-01-29	Uncertainty Quantification and Decomposition for LLM-based Recommendation	Wonbin Kweon et.al.	2501.17630	link
2025-01-28	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	Nikolai Kalischek et.al.	2501.17162	null
2025-01-28	AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders	Zhengxuan Wu et.al.	2501.17148	link
2025-01-28	FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data	Deren Lei et.al.	2501.17144	link
2025-01-28	ASTRAL: Automated Safety Testing of Large Language Models	Miriam Ugarte et.al.	2501.17132	null
2025-01-28	Scenario Understanding of Traffic Scenes Through Large Visual Language Models	Rivera Esteban et.al.	2501.17131	null
2025-01-28	COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models	Tobias Materzok et.al.	2501.17104	null
2025-01-28	Text-to-Image Generation for Vocabulary Learning Using the Keyword Method	Nuwan T. Attygalle et.al.	2501.17099	null
2025-01-28	Context is Key in Agent Security	Lillian Tsai et.al.	2501.17070	null
2025-01-28	Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding	Akash Kumar et.al.	2501.17053	null
2025-01-28	Large Language Models for Code Generation: The Practitioners Perspective	Zeeshan Rasheed et.al.	2501.16998	link
2025-01-27	RelightVid: Temporal-Consistent Diffusion Model for Video Relighting	Ye Fang et.al.	2501.16330	null
2025-01-27	Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology	Meiyun Cao et.al.	2501.16309	null
2025-01-27	RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval	Long Nguyen et.al.	2501.16303	null
2025-01-27	CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation	Xiaochuan Ma et.al.	2501.16246	null
2025-01-27	Language-Based Bayesian Optimization Research Assistant (BORA)	Abdoulatif Cissé et.al.	2501.16224	null
2025-01-27	Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models	Huayu Li et.al.	2501.16215	link
2025-01-27	Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs	Antony Bartlett et.al.	2501.16191	null
2025-01-27	Can summarization approximate simplification? A gold standard comparison	Giacomo Magnifico et.al.	2501.16181	null
2025-01-27	BAG: Body-Aligned 3D Wearable Asset Generation	Zhongjin Luo et.al.	2501.16177	null
2025-01-27	Will Systems of LLM Agents Cooperate: An Investigation into a Social Dilemma	Richard Willis et.al.	2501.16173	link
2025-01-24	HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation	Xin Zhou et.al.	2501.14729	link
2025-01-24	Do LLMs Provide Consistent Answers to Health-Related Questions across Languages?	Ipek Baris Schlicht et.al.	2501.14719	null
2025-01-24	Gland Segmentation Using SAM With Cancer Grade as a Prompt	Yijie Zhu et.al.	2501.14718	null
2025-01-24	Funzac at CoMeDi Shared Task: Modeling Annotator Disagreement from Word-In-Context Perspectives	Olufunke O. Sarumi et.al.	2501.14617	link
2025-01-24	Calibrating Wireless AI via Meta-Learned Context-Dependent Conformal Prediction	Seonghoon Yoo et.al.	2501.14566	null
2025-01-24	Next-Generation Wireless: Tracking the Evolutionary Path of 6G Mobile Communication	Ekram Hossain et.al.	2501.14552	null
2025-01-24	VERUS-LM: a Versatile Framework for Combining LLMs with Symbolic Reasoning	Benjamin Callewaert et.al.	2501.14540	null
2025-01-24	Automated Assignment Grading with Large Language Models: Insights From a Bioinformatics Course	Pavlin G. Poličar et.al.	2501.14499	null
2025-01-24	Evaluating and Improving Graph to Text Generation with Large Language Models	Jie He et.al.	2501.14497	link
2025-01-24	Boundary Value Test Input Generation Using Prompt Engineering with LLMs: Fault Detection and Coverage Analysis	Xiujing Guo et.al.	2501.14465	null
2025-01-23	GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing	Akashah Shabbir et.al.	2501.13925	link
2025-01-23	The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities	Chan-Jan Hsu et.al.	2501.13921	link
2025-01-23	IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models	Jiayi Lei et.al.	2501.13920	null
2025-01-23	Improving Video Generation with Human Feedback	Jie Liu et.al.	2501.13918	null
2025-01-23	Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models	Linh Tran et.al.	2501.13904	null
2025-01-23	Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning	Zuyao You et.al.	2501.13893	link
2025-01-23	Generating Realistic Forehead-Creases for User Verification via Conditioned Piecewise Polynomial Curves	Abhishek Tandon et.al.	2501.13889	link
2025-01-23	A RAG-Based Institutional Assistant	Gustavo Kuratomi et.al.	2501.13880	null
2025-01-23	Eye Gaze as a Signal for Conveying User Attention in Contextual AI Systems	Ethan Wilson et.al.	2501.13878	null
2025-01-23	Dual-Modal Prototype Joint Learning for Compositional Zero-Shot Learning	Shiyu Zhang et.al.	2501.13859	null
2025-01-22	Constructive characterisations of the must-preorder for asynchrony	Giovanni Bernardi et.al.	2501.13002	link
2025-01-22	Can supermassive stars form in protogalaxies due to internal Lyman-Werner feedback?	James Sullivan et.al.	2501.12986	null
2025-01-22	LLM4WM: Adapting LLM for Wireless Multi-Tasking	Xuanyu Liu et.al.	2501.12983	null
2025-01-22	UniUIR: Considering Underwater Image Restoration as An All-in-One Learner	Xu Zhang et.al.	2501.12981	null
2025-01-22	OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models	Chongren Sun et.al.	2501.12975	link
2025-01-22	Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference	Weizhi Fei et.al.	2501.12959	null
2025-01-22	PreciseCam: Precise Camera Control for Text-to-Image Generation	Edurne Bernal-Berdun et.al.	2501.12910	null
2025-01-22	The impact of hyperons on neutron star mergers: gravitational waves, mass ejection and black hole formation	Hristijan Kochankovski et.al.	2501.12905	null
2025-01-22	Architectural Fusion Through Contextual Partitioning in Large Language Models: A Novel Approach to Parameterized Knowledge Integration	Offa Kingsleigh et.al.	2501.12901	null
2025-01-22	HierPromptLM: A Pure PLM-based Framework for Representation Learning on Heterogeneous Text-rich Networks	Qiuyu Zhu et.al.	2501.12857	null
2025-01-21	Towards Affordance-Aware Articulation Synthesis for Rigged Objects	Yu-Chu Yu et.al.	2501.12393	null
2025-01-21	Is Long Context All You Need? Leveraging LLM’s Extended Context for NL2SQL	Yeounoh Chung et.al.	2501.12372	link
2025-01-21	FuocChuVIP123 at CoMeDi Shared Task: Disagreement Ranking with XLM-Roberta Sentence Embeddings and Deep Neural Regression	Phuoc Duong Huy Chu et.al.	2501.12336	null
2025-01-21	Decoherence of Schrödinger cat states in light of wave/particle duality	Th. K. Mavrogordatos et.al.	2501.12328	null
2025-01-21	UI-TARS: Pioneering Automated GUI Interaction with Native Agents	Yujia Qin et.al.	2501.12326	link
2025-01-21	CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification	Cristiano Patrício et.al.	2501.12266	null
2025-01-21	mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework	Bingyi Liu et.al.	2501.12263	null
2025-01-21	HAC++: Towards 100X Compression of 3D Gaussian Splatting	Yihang Chen et.al.	2501.12255	link
2025-01-21	CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning	Yuanheng Fang et.al.	2501.12226	null
2025-01-21	You Can’t Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense	Wuyuao Mai et.al.	2501.12210	null
2025-01-17	FaceXBench: Evaluating Multimodal LLMs on Face Understanding	Kartik Narayan et.al.	2501.10360	link
2025-01-17	Natural Language Processing of Privacy Policies: A Survey	Andrick Adhikari et.al.	2501.10319	null
2025-01-17	PaSa: An LLM Agent for Comprehensive Academic Paper Search	Yichen He et.al.	2501.10120	link
2025-01-17	How Do Programming Students Use Generative AI?	Christian Rahe et.al.	2501.10091	null
2025-01-17	CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment	Yating Liu et.al.	2501.10071	link
2025-01-17	FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization	Zhaopeng Gu et.al.	2501.10067	link
2025-01-17	OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning	Jinyuan Feng et.al.	2501.10062	null
2025-01-17	MSTS: A Multimodal Safety Test Suite for Vision-Language Models	Paul Röttger et.al.	2501.10057	link
2025-01-17	Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions	Zhijie Tan et.al.	2501.10011	null
2025-01-17	RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation	Yuefan Cao et.al.	2501.09982	null
2025-01-16	Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues	Youngjoon Jang et.al.	2501.09754	null
2025-01-16	Coming full circle – A unified framework for Kochen-Specker contextuality	Markus Frembs et.al.	2501.09750	null
2025-01-16	Suggesting Code Edits in Interactive Machine Learning Notebooks Using Large Language Models	Bihui Jin et.al.	2501.09745	null
2025-01-16	Comparative Insights from 12 Machine Learning Models in Extracting Economic Ideology from Political Text	Jihed Ncib et.al.	2501.09719	null
2025-01-16	CyberMentor: AI Powered Learning Tool Platform to Address Diverse Student Needs in Cybersecurity Education	Tianyu Wang et.al.	2501.09709	link
2025-01-16	Practical Continual Forgetting for Pre-trained Vision Models	Hongbo Zhao et.al.	2501.09705	link
2025-01-16	Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key	Zhihe Yang et.al.	2501.09695	link
2025-01-16	Quantum Contextual Hypergraphs, Operators, Inequalities, and Applications in Higher Dimensions	Mladen Pavicic et.al.	2501.09637	null
2025-01-16	LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading	Kuan-Ming Liu et.al.	2501.09636	null
2025-01-16	Constraints on Cosmic Rays Acceleration in Bright Gamma-ray Bursts with Observations of Fermi	Xing-Fu Zhang et.al.	2501.09594	null
2025-01-15	Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion	Jingyuan Chen et.al.	2501.09019	null
2025-01-15	Prompt gravitational-wave mergers aided by gas in Active Galactic Nuclei: The hydrodynamics of binary-single black hole scatterings	Connar Rowan et.al.	2501.09017	null
2025-01-15	How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias	Tosin Fadahunsi et.al.	2501.09014	link
2025-01-15	Bayesian analysis of analog gravity systems with the Rezzolla-Zhidenko metric	Saulo Albuquerque et.al.	2501.09000	null
2025-01-15	Analyzing the Ethical Logic of Six Large Language Models	W. Russell Neuman et.al.	2501.08951	null
2025-01-15	Disentangling Exploration of Large Language Models by Optimal Exploitation	Tim Grams et.al.	2501.08925	null
2025-01-15	Feature-based One-For-All: A Universal Framework for Heterogeneous Knowledge Distillation	Jhe-Hao Lin et.al.	2501.08885	null
2025-01-15	Exploring Task-Level Optimal Prompts for Visual In-Context Learning	Yan Zhu et.al.	2501.08841	null
2025-01-15	ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind	Kazutoshi Shinoda et.al.	2501.08838	link
2025-01-15	IDEA: Image Description Enhanced CLIP-Adapter	Zhipeng Ye et.al.	2501.08816	link
2025-01-14	DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models	Hyeonwoo Kim et.al.	2501.08333	null
2025-01-14	Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks	Miran Heo et.al.	2501.08326	null
2025-01-14	ADAM-1: AI and Bioinformatics for Alzheimer’s Detection and Microbiome-Clinical Data Integrations	Ziyuan Huang et.al.	2501.08324	null
2025-01-14	HALoGEN: Fantastic LLM Hallucinations and Where to Find Them	Abhilasha Ravichander et.al.	2501.08292	null
2025-01-14	SmartEraser: Remove Anything from Images using Masked-Region Guidance	Longtao Jiang et.al.	2501.08279	null
2025-01-14	Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing	Pulkit Arora et.al.	2501.08276	null
2025-01-14	TriMod Fusion for Multimodal Named Entity Recognition in Social Media	Mosab Alfaqeeh et.al.	2501.08267	null
2025-01-14	Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints	Jonathan Nöther et.al.	2501.08246	null
2025-01-14	ASTRID – An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems	Mohita Chowdhury et.al.	2501.08208	null
2025-01-14	ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving	Zain Ul Abedin et.al.	2501.08203	null
2025-01-13	Imagine while Reasoning in Space: Multimodal Visualization-of-Thought	Chengzu Li et.al.	2501.07542	null
2025-01-13	Investigating Large Language Models in Inferring Personality Traits from User Conversations	Jianfeng Zhu et.al.	2501.07532	null
2025-01-13	IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion	Tharun Anand et.al.	2501.07530	null
2025-01-13	RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment	Difei Gu et.al.	2501.07525	link
2025-01-13	Guided SAM: Label-Efficient Part Segmentation	S. B. van Rooij et.al.	2501.07434	null
2025-01-13	Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection	Xin Yin et.al.	2501.07425	null
2025-01-13	Initial Findings on Sensor based Open Vocabulary Activity Recognition via Text Embedding Inversion	Lala Shakti Swarup Ray et.al.	2501.07408	null
2025-01-13	Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models	Yasiru Ranasinghe et.al.	2501.07396	null
2025-01-13	Enhancing Retrieval-Augmented Generation: A Study of Best Practices	Siran Li et.al.	2501.07391	link
2025-01-13	Approaching ballistic motion in 3D simulations of gamma-ray burst jets in realistic binary neutron star merger environments	Emma Dreas et.al.	2501.07385	null
2025-01-10	Multi-subject Open-set Personalization in Video Generation	Tsai-Shien Chen et.al.	2501.06187	null
2025-01-10	PEACE: Empowering Geologic Map Holistic Understanding with MLLMs	Yangyu Huang et.al.	2501.06184	null
2025-01-10	ScooterLab: A Programmable and Participatory Sensing Research Testbed using Micromobility Vehicles	Ubaidullah Khan et.al.	2501.06177	null
2025-01-10	Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories	Gerd Kortemeyer et.al.	2501.06143	null
2025-01-10	Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI	Yuya Asano et.al.	2501.06129	null
2025-01-10	Explaining Deep Learning-based Anomaly Detection in Energy Consumption Data by Focusing on Contextually Relevant Data	Mohammad Noorchenarboo et.al.	2501.06099	null
2025-01-10	A Holistically Point-guided Text Framework for Weakly-Supervised Camouflaged Object Detection	Tsui Qin Mok et.al.	2501.06038	null
2025-01-10	The all-charm tetraquark and its contribution to two-photon processes	Panagiotis Kalamidas et.al.	2501.06034	null
2025-01-10	How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters	Romina Oji et.al.	2501.06025	link
2025-01-10	BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response	Hongruixuan Chen et.al.	2501.06019	link
2025-01-09	Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark	Yunzhuo Hao et.al.	2501.05444	link
2025-01-09	TimeDP: Learning to Generate Multi-Domain Time Series with Domain Prompts	Yu-Hao Huang et.al.	2501.05403	link
2025-01-09	FairCode: Evaluating Social Bias of LLMs in Code Generation	Yongkang Du et.al.	2501.05396	link
2025-01-09	CROPS: Model-Agnostic Training-Free Framework for Safe Image Synthesis with Latent Diffusion Models	Junha Park et.al.	2501.05359	null
2025-01-09	Continuity in Potential Infinite Models	Matthias Eberl et.al.	2501.05276	null
2025-01-09	CallNavi: A Study and Challenge on Function Calling Routing and Invocation in Large Language Models	Yewei Song et.al.	2501.05255	null
2025-01-09	Online Prompt and Solver Selection for Program Synthesis	Yixuan Li et.al.	2501.05247	null
2025-01-09	Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection	Pei-Kang Lee et.al.	2501.05228	null
2025-01-09	FaceMe: Robust Blind Face Restoration with Personal Identification	Siyu Liu et.al.	2501.05177	null
2025-01-09	Deep Assessment of Code Review Generation Approaches: Beyond Lexical Similarity	Yanjie Jiang et.al.	2501.05176	null
2025-01-08	Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding	Joshua Jones et.al.	2501.04693	null
2025-01-08	Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling	Nannan Li et.al.	2501.04666	null
2025-01-08	External quantum fluctuations select measurement contexts	Jonte R. Hance et.al.	2501.04664	null
2025-01-08	Assessing Language Comprehension in Large Language Models Using Construction Grammar	Wesley Scivetti et.al.	2501.04661	null
2025-01-08	FleSpeech: Flexibly Controllable Speech Generation with Various Prompts	Hanzhao Li et.al.	2501.04644	null
2025-01-08	“Can you be my mum?”: Manipulating Social Robots in the Large Language Models Era	Giulio Antonio Abbo et.al.	2501.04633	null
2025-01-08	MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation	Daniele Molino et.al.	2501.04614	null
2025-01-08	Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware Inversion	Yangfan He et.al.	2501.04606	link
2025-01-08	Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models	Miaoyang He et.al.	2501.04582	null
2025-01-08	The Impostor is Among Us: Can Large Language Models Capture the Complexity of Human Personas?	Christopher Lazik et.al.	2501.04543	null
2025-01-07	WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings	Haochen Song et.al.	2501.03999	null
2025-01-07	NeuralSVG: An Implicit Representation for Text-to-Vector Generation	Sagi Polaczek et.al.	2501.03992	null
2025-01-07	Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles	Yuxi Xia et.al.	2501.03991	null
2025-01-07	Semantically Cohesive Word Grouping in Indian Languages	N J Karthika et.al.	2501.03988	null
2025-01-07	VLM-driven Behavior Tree for Context-aware Task Planning	Naoki Wake et.al.	2501.03968	link
2025-01-07	Vision Language Models as Values Detectors	Giulio Antonio Abbo et.al.	2501.03957	null
2025-01-07	Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection	Pablo Miralles-González et.al.	2501.03940	null
2025-01-07	Truthful mechanisms for linear bandit games with private contexts	Yiting Hu et.al.	2501.03865	null
2025-01-07	Progressive Document-level Text Simplification via Large Language Models	Dengzhao Fang et.al.	2501.03857	null
2025-01-07	Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control	Zekai Gu et.al.	2501.03847	link
2025-01-06	Rate-My-LoRA: Efficient and Adaptive Federated Model Tuning for Cardiac MRI Segmentation	Xiaoxiao He et.al.	2501.03223	null
2025-01-06	Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction	Rui Qian et.al.	2501.03218	link
2025-01-06	The FACTS Grounding Leaderboard: Benchmarking LLMs’ Ability to Ground Responses to Long-Form Input	Alon Jacovi et.al.	2501.03200	null
2025-01-06	Visualizing quantum entanglement in Bose-Einstein condensates without state vectors	Russell B. Thompson et.al.	2501.03199	null
2025-01-06	Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text	Ali Al-Lawati et.al.	2501.03166	link
2025-01-06	The Scaling Law for LoRA Base on Mutual Information Upper Bound	Jing Zhang et.al.	2501.03152	null
2025-01-06	VicSim: Enhancing Victim Simulation with Emotional and Linguistic Fidelity	Yerong Li et.al.	2501.03139	null
2025-01-06	PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models	Mingyang Song et.al.	2501.03124	link
2025-01-06	LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases	Dylan Bouchard et.al.	2501.03112	link
2025-01-06	Physics, Environment and Environmental Education; Perceptions from trainee Natural Science teachers	Daniel Alejandro Valderrama et.al.	2501.03090	null
2025-01-03	Metadata Conditioning Accelerates Language Model Pre-training	Tianyu Gao et.al.	2501.01956	link
2025-01-03	Structural and Statistical Audio Texture Knowledge Distillation (SSATKD) for Passive Sonar Classification	Jarin Ritu et.al.	2501.01921	link
2025-01-03	Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions	Rachneet Sachdeva et.al.	2501.01872	link
2025-01-03	A review of long lasting activities of the central engine of gamma-ray bursts	Bruce Gendre et.al.	2501.01857	null
2025-01-03	MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning	Pu Yang et.al.	2501.01834	null
2025-01-03	Time Series Language Model for Descriptive Caption Generation	Mohamed Trabelsi et.al.	2501.01832	null
2025-01-03	Ingredients: Blending Custom Photos with Video Diffusion Transformers	Zhengcong Fei et.al.	2501.01790	link
2025-01-03	SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation	Mingjie Li et.al.	2501.01765	null
2025-01-03	How Toxic Can You Get? Search-based Toxicity Testing for Large Language Models	Simone Corbo et.al.	2501.01741	null
2025-01-03	AR4D: Autoregressive 4D Generation from Monocular Videos	Hanxin Zhu et.al.	2501.01722	null
2025-01-02	GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models	Zhangyang Qi et.al.	2501.01428	link
2025-01-02	Object-level Visual Prompts for Compositional Image Generation	Gaurav Parmar et.al.	2501.01424	null
2025-01-02	Multi-Modal Video Feature Extraction for Popularity Prediction	Haixu Liu et.al.	2501.01422	null
2025-01-02	Nested Attention: Semantic-aware Attention Values for Concept Personalization	Or Patashnik et.al.	2501.01407	null
2025-01-02	StereoMath: An Accessible and Musical Equation Editor	Kenneth Ge et.al.	2501.01404	null
2025-01-02	Training Medical Large Vision-Language Models with Abnormal-Aware Feedback	Yucheng Zhou et.al.	2501.01377	null
2025-01-02	Test-time Controllable Image Generation by Explicit Spatial Constraint Enforcement	Z. Zhang et.al.	2501.01368	null
2025-01-02	ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding	Austin T. Wang et.al.	2501.01366	null
2025-01-02	CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models	Johan Wahréus et.al.	2501.01335	link
2025-01-02	Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension	Yanbo Fang et.al.	2501.01332	null
2024-12-30	Distributed Mixture-of-Agents for Edge Inference with Large Language Models	Purbesh Mitra et.al.	2412.21200	link
2024-12-30	Adversarial Attack and Defense for LoRa Device Identification and Authentication via Deep Learning	Yalin E. Sagduyu et.al.	2412.21164	null
2024-12-30	Unified dimensionality reduction techniques in chronic liver disease detection	Anand Karna et.al.	2412.21156	null
2024-12-30	Exploring and Controlling Diversity in LLM-Agent Conversation	KuanChao Chu et.al.	2412.21102	null
2024-12-30	Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model	Yifei Huang et.al.	2412.21080	link
2024-12-30	Efficient Multi-Task Inferencing with a Shared Backbone and Lightweight Task-Specific Adapters for Automatic Scoring	Ehsan Latif et.al.	2412.21065	null
2024-12-30	Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration	Wanglong Lu et.al.	2412.21042	link
2024-12-30	Automated Robustness Testing for LLM-based NLP Software	Mingxuan Xiao et.al.	2412.21016	link
2024-12-30	Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline	Nicola Messina et.al.	2412.21009	link
2024-12-30	Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering	Junxiao Xue et.al.	2412.20927	null
2024-12-27	Enhancing Whisper’s Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization	Kumud Tripathi et.al.	2412.19785	null
2024-12-27	Hard Photon Triggered Jets in $p$-$p$ and $A$-$A$ Collisions	C. Sirimanna et.al.	2412.19738	null
2024-12-27	Can Large Language Models Adapt to Other Agents In-Context?	Matthew Riemer et.al.	2412.19726	null
2024-12-27	Toward Adaptive Reasoning in Large Language Models with Thought Rollback	Sijia Chen et.al.	2412.19707	link
2024-12-27	Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework	Jiang Liu et.al.	2412.19684	null
2024-12-27	Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP	Zhongxing Xu et.al.	2412.19650	null
2024-12-27	ReNeg: Learning Negative Embedding with Reward Guidance	Xiaomin Li et.al.	2412.19637	link
2024-12-27	RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations	Mingshu Zhao et.al.	2412.19628	link
2024-12-27	Signatures of prediction during natural listening in MEG data?	Sahel Azizpour et.al.	2412.19622	null
2024-12-27	Gradient Weight-normalized Low-rank Projection for Efficient LLM Training	Jia-Hong Huang et.al.	2412.19616	link
2024-12-24	Decentralized Intelligence in GameFi: Embodied AI Agents and the Convergence of DeFi and Virtual Ecosystems	Fernando Jia et.al.	2412.18601	link
2024-12-24	ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation	Hongjie Li et.al.	2412.18600	null
2024-12-24	DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation	Minghong Cai et.al.	2412.18597	link
2024-12-24	Exploring Embedding Priors in Prompt-Tuning for Improved Interpretability and Control	Sergey Sedov et.al.	2412.18582	null
2024-12-24	Distilling Fine-grained Sentiment Understanding from Large Language Models	Yice Zhang et.al.	2412.18552	link
2024-12-24	Token-Budget-Aware LLM Reasoning	Tingxu Han et.al.	2412.18547	link
2024-12-24	Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation	Derong Xu Xinhang Li et.al.	2412.18537	link
2024-12-24	Segment-Based Attention Masking for GPTs	Shahar Katz et.al.	2412.18487	link
2024-12-24	Betting vs. Trading: Learning a Linear Decision Policy for Selling Wind Power and Hydrogen	Yannick Heiser et.al.	2412.18479	null
2024-12-24	Is Large Language Model Good at Triple Set Prediction? An Empirical Study	Yuan Yuan et.al.	2412.18443	null
2024-12-23	The Superposition of Diffusion Models Using the Itô Density Estimator	Marta Skreta et.al.	2412.17762	null
2024-12-23	Reasoning to Attend: Try to Understand How Token Works	Rui Qian et.al.	2412.17741	link
2024-12-23	Contextual Backpropagation Loops: Amplifying Deep Reasoning with Iterative Top-Down Feedback	Jacob Fein-Ashley et.al.	2412.17737	link
2024-12-23	Chumor 2.0: Towards Benchmarking Chinese Humor Understanding	Ruiqi He et.al.	2412.17729	link
2024-12-23	Knowledge Editing through Chain-of-Thought	Changyue Wang et.al.	2412.17727	link
2024-12-23	The Cosmological Population of Gamma-Ray Bursts from the Disks of Active Galactic Nuclei	Hoyoung D. Kang et.al.	2412.17714	null
2024-12-23	EPE-P: Evidence-based Parameter-efficient Prompting for Multimodal Learning with Missing Modalities	Zhe Chen et.al.	2412.17677	link
2024-12-23	Detecting anxiety and depression in dialogues: a multi-label and explainable approach	Francisco de Arriba-Pérez et.al.	2412.17651	null
2024-12-23	DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder	Ente Lin et.al.	2412.17644	null
2024-12-23	LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding	Hao Li et.al.	2412.17635	null
2024-12-20	MotiF: Making Text Count in Image Animation with Motion Focal Loss	Shijie Wang et.al.	2412.16153	null
2024-12-20	A vector logic for extensional formal semantics	Daniel Quigley et.al.	2412.16152	null
2024-12-20	PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics	Daniil Larionov et.al.	2412.16120	null
2024-12-20	Fearful Falcons and Angry Llamas: Emotion Category Annotations of Arguments by Humans and LLMs	Lynn Greschner et.al.	2412.15993	null
2024-12-20	APIRL: Deep Reinforcement Learning for REST API Fuzzing	Myles Foley et.al.	2412.15991	link
2024-12-20	From General to Specific: Tailoring Large Language Models for Personalized Healthcare	Ruize Shi et.al.	2412.15957	null
2024-12-20	MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection	Andrea Moglia et.al.	2412.15925	link
2024-12-20	On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education	Lorenz Wendlinger et.al.	2412.15902	null
2024-12-20	On Robust Cross Domain Alignment	Anish Chakrabarty et.al.	2412.15861	null
2024-12-20	Multi-dimensional Visual Prompt Enhanced Image Restoration via Mamba-Transformer Aggregation	Aiwen Jiang et.al.	2412.15845	link
2024-12-19	PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation	Muntasir Wahed et.al.	2412.15209	null
2024-12-19	FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching	Sucheng Ren et.al.	2412.15205	link
2024-12-19	Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying	Federico Castagna et.al.	2412.15177	link
2024-12-19	Rethinking Uncertainty Estimation in Natural Language Generation	Lukas Aichberger et.al.	2412.15176	null
2024-12-19	Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM	Yatai Ji et.al.	2412.15156	link
2024-12-19	AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling	Zihan Liu et.al.	2412.15084	null
2024-12-19	MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance	Hallee E. Wong et.al.	2412.15058	null
2024-12-19	Measuring, Modeling, and Helping People Account for Privacy Risks in Online Self-Disclosures with AI	Isadora Krsek et.al.	2412.15047	null
2024-12-19	LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps	Felix Friedrich et.al.	2412.15035	null
2024-12-19	Large Language Models and Code Security: A Systematic Literature Review	Enna Basic et.al.	2412.15004	null
2024-12-18	FashionComposer: Compositional Fashion Image Generation	Sihui Ji et.al.	2412.14168	null
2024-12-18	Alignment faking in large language models	Ryan Greenblatt et.al.	2412.14093	link
2024-12-18	Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets	Simon Thorne et.al.	2412.14062	null
2024-12-18	Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation	Vera Neplenbroek et.al.	2412.14050	link
2024-12-18	Hansel: Output Length Controlling Framework for Large Language Models	Seoha Song et.al.	2412.14033	null
2024-12-18	Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation	Haotong Lin et.al.	2412.14015	link
2024-12-18	What makes a good metric? Evaluating automatic metrics for text-to-image consistency	Candace Ross et.al.	2412.13989	null
2024-12-18	RAG for Effective Supply Chain Security Questionnaire Automation	Zaynab Batool Reza et.al.	2412.13988	null
2024-12-18	Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation	Eleni Sgouritsa et.al.	2412.13952	null
2024-12-18	CoRa: A Collision-Resistant LoRa Symbol Detector of Low Complexity	José Álamos et.al.	2412.13930	null
2024-12-17	CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models	Gaoyang Zhang et.al.	2412.13195	link
2024-12-17	MotionBridge: Dynamic Video Inbetweening with Flexible Controls	Maham Tanveer et.al.	2412.13190	null
2024-12-17	Move-in-2D: 2D-Conditioned Human Motion Generation	Hsin-Ping Huang et.al.	2412.13185	null
2024-12-17	DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation	Miriam Wanner et.al.	2412.13175	null
2024-12-17	Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study	Bolei Ma et.al.	2412.13169	link
2024-12-17	F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration	Lu Liu et.al.	2412.13155	null
2024-12-17	Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation	Huaijin Pi et.al.	2412.13111	null
2024-12-17	Prompt Augmentation for Self-supervised Text-guided Image Manipulation	Rumeysa Bodur et.al.	2412.13081	null
2024-12-17	Identifying Bias in Deep Neural Networks Using Image Transforms	Sai Teja Erukude et.al.	2412.13079	link
2024-12-17	Harnessing Event Sensory Data for Error Pattern Prediction in Vehicles: A Language Model Approach	Hugo Math et.al.	2412.13041	link
2024-12-16	CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology	Yuxuan Sun et.al.	2412.12077	null
2024-12-16	A LoRA is Worth a Thousand Pictures	Chenxi Liu et.al.	2412.12048	null
2024-12-16	How Private are Language Models in Abstractive Summarization?	Anthony Hughes et.al.	2412.12040	null
2024-12-16	Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection	Ira Ceka et.al.	2412.12039	null
2024-12-16	Agentic AI-Driven Technical Troubleshooting for Enterprise Systems: A Novel Weighted Retrieval-Augmented Generation Paradigm	Rajat Khanda et.al.	2412.12006	null
2024-12-16	The Open Source Advantage in Large Language Models (LLMs)	Jiya Manchanda et.al.	2412.12004	null
2024-12-16	SAMIC: Segment Anything with In-Context Spatial Prompt Engineering	Savinay Nagendra et.al.	2412.11998	null
2024-12-16	Combining Large Language Models with Tutoring System Intelligence: A Case Study in Caregiver Homework Support	Devika Venugopalan et.al.	2412.11995	link
2024-12-16	Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives	Sam Relins et.al.	2412.11878	link
2024-12-16	A Distributed Collaborative Retrieval Framework Excelling in All Queries and Corpora based on Zero-shot Rank-Oriented Automatic Evaluation	Tian-Yi Che et.al.	2412.11832	null
2024-12-13	A Grounded Typology of Word Classes	Coleman Haley et.al.	2412.10369	null
2024-12-13	TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies	Ruijie Zheng et.al.	2412.10345	null
2024-12-13	SCBench: A KV Cache-Centric Analysis of Long-Context Methods	Yucheng Li et.al.	2412.10319	null
2024-12-13	My Statistics is Better than Yours	Simon Benhaïem et.al.	2412.10296	null
2024-12-13	Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation	Yu-Jhe Li et.al.	2412.10292	null
2024-12-13	One world, one opinion? The superstar effect in LLM responses	Sofie Goethals et.al.	2412.10281	null
2024-12-13	Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT	Danielle R. Thomas et.al.	2412.10267	link
2024-12-13	Targeted Angular Reversal of Weights (TARS) for Knowledge Removal in Large Language Models	Harry J. Davies et.al.	2412.10257	null
2024-12-13	Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts	Hazel Kim et.al.	2412.10246	null
2024-12-13	SPT: Sequence Prompt Transformer for Interactive Image Segmentation	Senlin Cheng et.al.	2412.10224	null
2024-12-12	Illusion3D: 3D Multiview Illusion with 2D Diffusion Priors	Yue Feng et.al.	2412.09625	null
2024-12-12	LoRACLR: Contrastive Adaptation for Customization of Diffusion Models	Enis Simsar et.al.	2412.09622	null
2024-12-12	EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM	Zhuofan Zong et.al.	2412.09618	null
2024-12-12	Context Canvas: Enhancing Text-to-Image Diffusion Models with Knowledge Graph-Based RAG	Kavana Venkatesh et.al.	2412.09614	null
2024-12-12	TimeRefine: Temporal Grounding with Time Refining Video LLM	Xizi Wang et.al.	2412.09601	link
2024-12-12	Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders	Fiona Ryan et.al.	2412.09586	link
2024-12-12	Obfuscated Activations Bypass LLM Latent-Space Defenses	Luke Bailey et.al.	2412.09565	null
2024-12-12	Does Representation Matter? Exploring Intermediate Layers in Large Language Models	Oscar Skean et.al.	2412.09563	null
2024-12-12	SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing	Xueting Li et.al.	2412.09545	null
2024-12-12	Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM	Han Wang et.al.	2412.09530	link
2024-12-11	GPD-1: Generative Pre-training for Driving	Zixun Xie et.al.	2412.08643	link
2024-12-11	Fast Prompt Alignment for Text-to-Image Generation	Khalil Mrini et.al.	2412.08639	link
2024-12-11	DMin: Scalable Training Data Influence Estimation for Diffusion Models	Huawei Lin et.al.	2412.08637	link
2024-12-11	FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models	Vladimir Kulikov et.al.	2412.08629	link
2024-12-11	Der Effizienz- und Intelligenzbegriff in der Lexikographie und kuenstlichen Intelligenz: kann ChatGPT die lexikographische Textsorte nachbilden?	Ivan Arias-Arias et.al.	2412.08599	null
2024-12-11	Leveraging Graph-RAG and Prompt Engineering to Enhance LLM-Based Automated Requirement Traceability and Compliance Checks	Arsalan Masoudifard et.al.	2412.08593	null
2024-12-11	LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations	Zejian Li et.al.	2412.08580	link
2024-12-11	Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation	Hongming Guo et.al.	2412.08577	null
2024-12-11	Can We Generate Visual Programs Without Prompting LLMs?	Michal Shlapentokh-Rothman et.al.	2412.08564	null
2024-12-11	Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations	Hugo Flores García et.al.	2412.08550	null
2024-12-10	From Slow Bidirectional to Fast Causal Video Generators	Tianwei Yin et.al.	2412.07772	null
2024-12-10	Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting	Zetong Yang et.al.	2412.07768	null
2024-12-10	Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds	Xiaoyu Xiang et.al.	2412.07766	null
2024-12-10	PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation	Fatemeh Nazarieh et.al.	2412.07754	null
2024-12-10	Multi-Shot Character Consistency for Text-to-Video Generation	Yuval Atzmon et.al.	2412.07750	null
2024-12-10	LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models	Ziqi Lu et.al.	2412.07746	null
2024-12-10	StyleMaster: Stylize Your Video with Artistic Generation and Translation	Zixuan Ye et.al.	2412.07744	null
2024-12-10	SKIPNet: Spatial Attention Skip Connections for Enhanced Brain Tumor Classification	Khush Mendiratta et.al.	2412.07736	null
2024-12-10	Granite Guardian	Inkit Padhi et.al.	2412.07724	link
2024-12-10	Leveraging Content and Context Cues for Low-Light Image Enhancement	Igor Morawski et.al.	2412.07693	link
2024-12-09	Visual Lexicon: Rich Image Features in Language Space	XuDong Wang et.al.	2412.06774	null
2024-12-09	Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty	Meera Hahn et.al.	2412.06771	link
2024-12-09	Ranking-aware adapter for text-driven image ordering with CLIP	Wei-Hsiang Yu et.al.	2412.06760	link
2024-12-09	JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM	Takuro Fujii et.al.	2412.06738	link
2024-12-09	Revisiting GRB 060218: new insights into low-luminosity gamma-ray bursts from a revised shock breakout model	Christopher M. Irwin et.al.	2412.06736	null
2024-12-09	AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark	Lan Li et.al.	2412.06724	link
2024-12-09	VP-MEL: Visual Prompts Guided Multimodal Entity Linking	Hongze Mi et.al.	2412.06720	null
2024-12-09	Facade: High-Precision Insider Threat Detection Using Deep Contextual Anomaly Detection	Alex Kantchelian et.al.	2412.06700	null
2024-12-09	Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach	Weichao Xu et.al.	2412.06684	null
2024-12-09	Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion	Shuaiting Li et.al.	2412.06661	null
2024-12-06	Sparse autoencoders reveal selective remapping of visual concepts during adaptation	Hyesu Lim et.al.	2412.05276	link
2024-12-06	Mind the Time: Temporally-Controlled Multi-Event Video Generation	Ziyi Wu et.al.	2412.05263	null
2024-12-06	TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft	Qian Long et.al.	2412.05255	link
2024-12-06	From classical techniques to convolution-based models: A review of object detection algorithms	Fnu Neha et.al.	2412.05252	null
2024-12-06	LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds	James Beetham et.al.	2412.05232	null
2024-12-06	Are Frontier Large Language Models Suitable for Q&A in Science Centres?	Jacob Watson et.al.	2412.05200	null
2024-12-06	QueEn: A Large Language Model for Quechua-English Translation	Junhao Chen et.al.	2412.05184	null
2024-12-06	A text-to-tabular approach to generate synthetic patient data using LLMs	Margaux Tornqvist et.al.	2412.05153	link
2024-12-06	LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation	Donald Shenaj et.al.	2412.05148	link
2024-12-06	A Practical Examination of AI-Generated Text Detectors for Large Language Models	Brian Tufts et.al.	2412.05139	null
2024-12-05	Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail	Luca Bartolomei et.al.	2412.04472	link
2024-12-05	PaintScene4D: Consistent 4D Scene Generation from Text Prompts	Vinayak Gupta et.al.	2412.04471	null
2024-12-05	UnZipLoRA: Separating Content and Style from a Single Image	Chang Liu et.al.	2412.04465	null
2024-12-05	Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection	Enshen Zhou et.al.	2412.04455	null
2024-12-05	EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios	Lu Qiu et.al.	2412.04447	null
2024-12-05	GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration	Kaiyi Huang et.al.	2412.04440	null
2024-12-05	Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation	Yuying Ge et.al.	2412.04432	link
2024-12-05	Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion	Jiuhai Chen et.al.	2412.04424	link
2024-12-05	Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation	Xuying Li et.al.	2412.04415	null
2024-12-05	Discriminative Fine-tuning of LVLMs	Yassine Ouali et.al.	2412.04378	null
2024-12-04	Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning	Wujian Peng et.al.	2412.03565	link
2024-12-04	Best-of-N Jailbreaking	John Hughes et.al.	2412.03556	link
2024-12-04	Imagine360: Immersive 360 Video Generation from Perspective Anchor	Jing Tan et.al.	2412.03552	null
2024-12-04	Perception Tokens Enhance Visual Reasoning in Multimodal Language Models	Mahtab Bigverdi et.al.	2412.03548	null
2024-12-04	Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models	Natalie Mackraz et.al.	2412.03537	null
2024-12-04	A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences	Gabriel Lino Garcia et.al.	2412.03531	null
2024-12-04	You’re (Not) My Type – Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks?	Dominic Lohr et.al.	2412.03516	null
2024-12-04	Gesture Classification in Artworks Using Contextual Image Features	Azhar Hussian et.al.	2412.03456	null
2024-12-04	PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation	Ao Wang et.al.	2412.03409	link
2024-12-04	Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment	Feng He et.al.	2412.03400	null
2024-12-03	Motion Prompting: Controlling Video Generation with Motion Trajectories	Daniel Geng et.al.	2412.02700	null
2024-12-03	Diffusion-based Visual Anagram as Multi-task Learning	Zhiyuan Xu et.al.	2412.02693	link
2024-12-03	SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance	Viet Nguyen et.al.	2412.02687	null
2024-12-03	T-REG: Preference Optimization with Token-Level Reward Regularization	Wenxuan Zhou et.al.	2412.02685	null
2024-12-03	Liquefaction: Privately Liquefying Blockchain Assets	James Austgen et.al.	2412.02634	null
2024-12-03	Time-Reversal Provides Unsupervised Feedback to LLMs	Yerram Varun et.al.	2412.02626	null
2024-12-03	Explainable CTR Prediction via LLM Reasoning	Xiaohan Yu et.al.	2412.02588	null
2024-12-03	Copy-Move Forgery Detection and Question Answering for Remote Sensing Image	Ze Zhang et.al.	2412.02575	link
2024-12-03	Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey	Chenyang Liu et.al.	2412.02573	link
2024-12-03	Unveiling Concept Attribution in Diffusion Models	Quang H. Nguyen et.al.	2412.02542	link
2024-11-29	SIMS: Simulating Human-Scene Interactions with Real World Script Planning	Wenjia Wang et.al.	2411.19921	null
2024-11-29	Handling irresolvable conflicts in the Semantic Web: an RDF-based conflict-tolerant version of the Deontic Traditional Scheme	Livio Robaldo et.al.	2411.19918	link
2024-11-29	Another look at inference after prediction	Jessica Gronsbell et.al.	2411.19908	link
2024-11-29	Cross-Domain Recommendation Meets Large Language Models	Ajay Krishna Vajjala et.al.	2411.19862	link
2024-11-29	Neuroplasticity and Psychedelics: a comprehensive examination of classic and non-classic compounds in pre and clinical models	Claudio Agnorelli et.al.	2411.19840	null
2024-11-29	Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation	Robin D. Pesl et.al.	2411.19804	null
2024-11-29	PerLA: Perceptive 3D Language Assistant	Guofeng Mei et.al.	2411.19774	null
2024-11-29	SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks	Kim-Celine Kahl et.al.	2411.19688	link
2024-11-29	Measurement of the Inclusive Cross Sections of Prompt $J/ψ$ and $ψ(3686)$ Production in $e^{+}e^{-}$ Annihilation from $\sqrt{s}=3.808$ to $4.951$ GeV	BESIII Collaboration et.al.	2411.19642	null
2024-11-29	Unleashing the Transformative Power of Deliberation With Contextual Citizens	Ariane Lambert-Mogiliansky et.al.	2411.19596	null
2024-11-27	Evaluating and Improving the Effectiveness of Synthetic Chest X-Rays for Medical Image Analysis	Eva Prakash et.al.	2411.18602	null
2024-11-27	Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning	Omkar Khade et.al.	2411.18571	null
2024-11-27	A Pipeline of Neural-Symbolic Integration to Enhance Spatial Reasoning in Large Language Models	Rong Wang et.al.	2411.18564	null
2024-11-27	Bumblebee cosmology: Tests using distance- and time-redshift probes	Xincheng Zhu et.al.	2411.18559	null
2024-11-27	Emergence of Self-Identity in AI: A Mathematical Framework and Empirical Study with Generative Large Language Models	Minhyeok Lee et.al.	2411.18530	link
2024-11-27	Perturbation Ontology based Graph Attention Networks	Yichen Wang et.al.	2411.18520	null
2024-11-27	Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS	Jinyang Wu et.al.	2411.18478	null
2024-11-28	MM-Path: Multi-modal, Multi-granularity Path Representation Learning – Extended Version	Ronghui Xu et.al.	2411.18428	link
2024-11-27	Short-time existence and uniqueness for some infinite-dimensional Nash systems	Davide Francesco Redaelli et.al.	2411.18356	null
2024-11-27	TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models	Riza Velioglu et.al.	2411.18350	link
2024-11-26	Video-Guided Foley Sound Generation with Multimodal Controls	Ziyang Chen et.al.	2411.17698	null
2024-11-26	Instance-Aware Graph Prompt Learning	Jiazheng Li et.al.	2411.17676	null
2024-11-26	Push the Limit of Multi-modal Emotion Recognition by Prompting LLMs with Receptive-Field-Aware Attention Weighting	Liyun Zhang et.al.	2411.17674	null
2024-11-26	SketchAgent: Language-Driven Sequential Sketch Generation	Yael Vinker et.al.	2411.17673	null
2024-11-26	Synthetic Data Generation with LLM for Improved Depression Prediction	Andrea Kang et.al.	2411.17672	null
2024-11-26	Linguistic Laws Meet Protein Sequences: A Comparative Analysis of Subword Tokenization Methods	Burak Suyunu et.al.	2411.17669	link
2024-11-26	BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings	Abhay Shanbhag et.al.	2411.17661	link
2024-11-26	Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism	Yi-Chien Lin et.al.	2411.17651	link
2024-11-26	SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation	Claudia Cuttano et.al.	2411.17646	link
2024-11-26	Uma proposta para o uso de RPG no Ensino de Física: A Vingança de Newton	Maria Rita Vasconcelos Brandão Souza et.al.	2411.17642	null
2024-11-25	Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective	Jean Marie Tshimula et.al.	2411.16642	null
2024-11-25	Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric	Zhichao Zhang et.al.	2411.16619	null
2024-11-25	MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series	Aaron Wheeler et.al.	2411.16585	link
2024-11-25	RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics	Chan Hee Song et.al.	2411.16537	null
2024-11-25	Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings	Carolin M. Schuster et.al.	2411.16527	link
2024-11-25	Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency	Jerry Yao-Chieh Hu et.al.	2411.16525	null
2024-11-25	Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis	Boming Miao et.al.	2411.16503	null
2024-11-25	Interpreting Language Reward Models via Contrastive Explanations	Junqi Jiang et.al.	2411.16502	null
2024-11-25	Learning by Analogy: Enhancing Few-Shot Prompting for Math Word Problem Solving with Computational Graph-Based Retrieval	Xiaocong Yang et.al.	2411.16454	null
2024-11-25	VQ-SGen: A Vector Quantized Stroke Representation for Sketch Generation	Jiawei Wang et.al.	2411.16446	null
2024-11-22	VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement	Daeun Lee et.al.	2411.15115	null
2024-11-22	AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution	Fengyuan Liu et.al.	2411.15102	link
2024-11-22	Instance-Aware Generalized Referring Expression Segmentation	E-Ro Nguyen et.al.	2411.15087	null
2024-11-22	FloAt: Flow Warping of Self-Attention for Clothing Animation Generation	Swasti Shreya Mishra et.al.	2411.15028	null
2024-11-22	FTA generation using GenAI with an Autonomy sensor Usecase	Sneha Sudhir Shetiya et.al.	2411.15007	null
2024-11-22	ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data	Junhong Shen et.al.	2411.15004	link
2024-11-22	Free Energy Projective Simulation (FEPS): Active inference with interpretability	Joséphine Pazem et.al.	2411.14991	null
2024-11-22	Generative AI may backfire for counterspeech	Dominik Bär et.al.	2411.14986	null
2024-11-22	Exploring Foundation Models Fine-Tuning for Cytology Classification	Manon Dausort et.al.	2411.14975	link
2024-11-22	Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation	Colin Diggs et.al.	2411.14971	null
2024-11-21	Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models	Yuhao Dong et.al.	2411.14432	link
2024-11-21	Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings	Aaron Zheng et.al.	2411.14398	null
2024-11-21	Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation	Yuanhao Cai et.al.	2411.14384	null
2024-11-21	DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding	Tianhe Ren et.al.	2411.14347	link
2024-11-21	UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages	Bethel Melesse Tessema et.al.	2411.14343	link
2024-11-21	Auto-SPICE: Leveraging LLMs for Dataset Creation via Automated SPICE Netlist Extraction from Analog Circuit Diagrams	Jitendra Bhandari et.al.	2411.14299	link
2024-11-21	CAIP: Detecting Router Misconfigurations with Context-Aware Iterative Prompting of LLMs	Xi Jiang et.al.	2411.14283	null
2024-11-21	Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance	Haozhe Zhao et.al.	2411.14279	null
2024-11-21	Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification	Junhua Liu et.al.	2411.14252	null
2024-11-21	Natural Language Reinforcement Learning	Xidong Feng et.al.	2411.14251	link
2024-11-20	Metacognition for Unknown Situations and Environments (MUSE)	Rodolfo Valiente et.al.	2411.13537	null
2024-11-20	VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models	Ziqi Huang et.al.	2411.13503	link
2024-11-20	AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations	Gaurav Verma et.al.	2411.13451	null
2024-11-20	From Prompt Engineering to Prompt Craft	Joseph Lindley et.al.	2411.13422	null
2024-11-20	Theory-independent monitoring of the decoherence of a superconducting qubit with generalized contextuality	Albert Aloy et.al.	2411.13421	link
2024-11-20	Unleashing the Power of Large Language Models for Group POI Recommendations	Jing Long et.al.	2411.13415	null
2024-11-21	Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese	Dat Van-Thanh Nguyen et.al.	2411.13407	null
2024-11-20	Adversarial Diffusion Compression for Real-World Image Super-Resolution	Bin Chen et.al.	2411.13383	link
2024-11-20	I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception	Jiawei Zhang et.al.	2411.13314	null
2024-11-20	Combining Autoregressive and Autoencoder Language Models for Text Classification	João Gonçalves et.al.	2411.13282	link
2024-11-19	ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models	Salma Kharrat et.al.	2411.12736	link
2024-11-19	Neurosymbolic Graph Enrichment for Grounded World Models	Stefano De Giorgis et.al.	2411.12671	null
2024-11-19	SAM Carries the Burden: A Semi-Supervised Approach Refining Pseudo Labels for Medical Segmentation	Ron Keuth et.al.	2411.12602	link
2024-11-19	AdaCM $^2$ : On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction	Yuanbin Man et.al.	2411.12593	null
2024-11-19	Large Language Models for Combinatorial Optimization of Design Structure Matrix	Shuo Jiang et.al.	2411.12571	null
2024-11-19	Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution	Yang Zou et.al.	2411.12530	link
2024-11-19	Human-AI Co-Creativity: Exploring Synergies Across Levels of Creative Collaboration	Jennifer Haase et.al.	2411.12527	null
2024-11-19	3D Reconstruction by Looking: Instantaneous Blind Spot Detector for Indoor SLAM through Mixed Reality	Hanbeom Chang et.al.	2411.12514	null
2024-11-19	Evaluating the Prompt Steerability of Large Language Models	Erik Miehling et.al.	2411.12405	link
2024-11-19	DGSNA: prompt-based Dynamic Generative Scene-based Noise Addition method	Zihao Chen et.al.	2411.12363	null
2024-11-18	Absorbing state dynamics of stochastic gradient descent	Guanming Zhang et.al.	2411.11834	null
2024-11-18	The Lambda Calculus is Quantifiable	Valentin Maestracci et.al.	2411.11809	null
2024-11-18	Novel Application of Neutrinos to Evaluate U.S. Nuclear Weapons Performance	J. R. Distel et.al.	2411.11804	null
2024-11-18	Competing Bandits in Decentralized Large Contextual Matching Markets	Satush Parikh et.al.	2411.11794	null
2024-11-18	LLM-IE: A Python Package for Generative Information Extraction with Large Language Models	Enshuo Hsu et.al.	2411.11779	null
2024-11-18	Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment	Allison Huang et.al.	2411.11731	link
2024-11-18	Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation	Mingchao Qi et.al.	2411.11714	link
2024-11-18	Exploring LLMs for Verifying Technical System Specifications Against Requirements	Lasse M. Reinpold et.al.	2411.11582	null
2024-11-18	Simple But Not Secure: An Empirical Security Analysis of Two-factor Authentication Systems	Zhi Wang et.al.	2411.11551	null
2024-11-18	A Code Knowledge Graph-Enhanced System for LLM-Based Fuzz Driver Generation	Hanxiang Xu et.al.	2411.11532	link
2024-11-15	LLaVA-o1: Let Vision Language Models Reason Step-by-Step	Guowei Xu et.al.	2411.10440	link
2024-11-15	Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations	Jianfeng Chi et.al.	2411.10414	null
2024-11-15	Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation	Markus Karmann et.al.	2411.10411	null
2024-11-15	On the Foundation Model for Cardiac MRI Reconstruction	Chi Zhang et.al.	2411.10403	null
2024-11-15	A Survey of Event Causality Identification: Principles, Taxonomy, Challenges, and Assessment	Zefan Zeng et.al.	2411.10371	null
2024-11-15	Bias Unveiled: Investigating Social Bias in LLM-Generated Code	Lin Ling et.al.	2411.10351	null
2024-11-15	Number it: Temporal Grounding Videos like Flipping Manga	Yongliang Wu et.al.	2411.10332	link
2024-11-15	Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding	Huming Qiu et.al.	2411.10329	null
2024-11-15	Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning	Jingru Yang et.al.	2411.10252	null
2024-11-15	Measuring Non-Adversarial Reproduction of Training Data in Large Language Models	Michael Aerni et.al.	2411.10242	null
2024-11-14	MagicQuill: An Intelligent Interactive Image Editing System	Zichen Liu et.al.	2411.09703	link
2024-11-14	LLM Hallucination Reasoning with Zero-shot Knowledge Test	Seongmin Lee et.al.	2411.09689	null
2024-11-14	Squeezed Attention: Accelerating Long Context Length LLM Inference	Coleman Hooper et.al.	2411.09688	link
2024-11-14	The lowest-radiation environments in the Solar System: new opportunities for underground rare-event searches	Xilin Zhang et.al.	2411.09634	null
2024-11-14	Local deployment of large-scale music AI models on commodity hardware	Xun Zhou et.al.	2411.09625	null
2024-11-14	PTR: Precision-Driven Tool Recommendation for Large Language Models	Hang Gao et.al.	2411.09613	null
2024-11-14	Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration	Yifan Shao et.al.	2411.09604	link
2024-11-14	LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models	Zhengyi Wang et.al.	2411.09595	null
2024-11-14	SimTube: Generating Simulated Video Comments through Multimodal AI and User Personas	Yu-Kai Hung et.al.	2411.09577	null
2024-11-14	Spider: Any-to-Many Multimodal LLM	Jinxiang Lai et.al.	2411.09439	link
2024-11-13	Large Wireless Model (LWM): A Foundation Model for Wireless Channels	Sadjad Alikhani et.al.	2411.08872	link
2024-11-13	The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models	Daniel P. Jeong et.al.	2411.08870	link
2024-11-13	CamemBERT 2.0: A Smarter French Language Model Aged to Perfection	Wissam Antoun et.al.	2411.08868	null
2024-11-13	LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs	Piyush Jha et.al.	2411.08862	null
2024-11-13	Process-aware Human Activity Recognition	Jiawei Zheng et.al.	2411.08814	null
2024-11-13	Logic-based Knowledge Awareness for Autonomous Agents in Continuous Spaces	Arabinda Ghosh et.al.	2411.08754	null
2024-11-13	Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers	Clément Dumas et.al.	2411.08745	link
2024-11-13	New advances in universal approximation with neural networks of minimal width	Dennis Rochau et.al.	2411.08735	null
2024-11-14	Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models	Somanshu Singla et.al.	2411.08733	link
2024-11-13	Polymetis:Large Language Modeling for Multiple Material Domains	Chao Huang et.al.	2411.08728	null
2024-11-12	From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents	Chuyi Kong et.al.	2411.07965	null
2024-11-12	MANTIS: A Mixed-Signal Near-Sensor Convolutional Imager SoC Using Charge-Domain 4b-Weighted 5-to-84-TOPS/W MAC Operations for Feature Extraction and Region-of-Interest Detection	Martin Lefebvre et.al.	2411.07946	null
2024-11-12	CryptoLLM: Unleashing the Power of Prompted LLMs for SmartQnA and Classification of Crypto Posts	Aniket Deroy et.al.	2411.07917	null
2024-11-12	INTRABENCH: Interactive Radiological Benchmark	Constantin Ulrich et.al.	2411.07885	null
2024-11-12	Verbosity $\neq$ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models	Yusen Zhang et.al.	2411.07858	link
2024-11-12	FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training	Philip Zmushko et.al.	2411.07837	link
2024-11-12	Efficient Federated Finetuning of Tiny Transformers with Resource-Constrained Devices	Kilian Pfeiffer et.al.	2411.07826	null
2024-11-12	Federated Low-Rank Adaptation with Differential Privacy over Wireless Networks	Tianqu Kang et.al.	2411.07806	null
2024-11-12	RedCode: Risky Code Execution and Generation Benchmark for Code Agents	Chengquan Guo et.al.	2411.07781	link
2024-11-12	Topological resilience of optical skyrmions in local decoherence	Li-Wen Wang et.al.	2411.07775	null
2024-11-11	Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations	Chaitanya Malaviya et.al.	2411.07237	null
2024-11-11	Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models	Yoad Tewel et.al.	2411.07232	null
2024-11-11	Tasks, Time, and Tools: Quantifying Online Sensemaking Efforts Through a Survey-based Study	Andrew Kuznetsov et.al.	2411.07206	null
2024-11-11	DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID	Nyle Siddiqui et.al.	2411.07205	link
2024-11-11	NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics	David Robinson et.al.	2411.07186	null
2024-11-11	SAMPart3D: Segment Any Part in 3D Objects	Yunhan Yang et.al.	2411.07184	link
2024-11-11	Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis	Taihang Hu et.al.	2411.07132	link
2024-11-11	Fast and Robust Contextual Node Representation Learning over Dynamic Graphs	Xingzhi Guo et.al.	2411.07123	null
2024-11-11	Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation	Ziwei Liu et.al.	2411.07021	null
2024-11-11	Flaring gamma-ray emission coincident with a hyperactive fast radio burst source	Yi Xing et.al.	2411.06996	null
2024-11-08	LLMs as Method Actors: A Model for Prompt Engineering and Architecture	Colin Doyle et.al.	2411.05778	link
2024-11-08	Quantitative Assessment of Intersectional Empathetic Bias and Understanding	Vojtech Formanek et.al.	2411.05777	link
2024-11-08	End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering	Dylan Goetting et.al.	2411.05755	link
2024-11-08	A doublet of cosmological models to challenge the H0 tension in the Pantheon Supernovae Ia catalog	B. De Simone et.al.	2411.05744	null
2024-11-08	Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity Recognition	Abhisek Ray et.al.	2411.05692	link
2024-11-08	Tell What You Hear From What You See – Video to Audio Generation Through Text	Xiulong Liu et.al.	2411.05679	link
2024-11-08	Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation	Xiwen Wei et.al.	2411.05663	link
2024-11-08	Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation	Long Truong To et.al.	2411.05641	null
2024-11-08	From Resource Control to Digital Trust with User-Managed Access	Wouter Termont et.al.	2411.05622	null
2024-11-08	Evaluating and Adapting Large Language Models to Represent Folktales in Low-Resource Languages	JA Meaney et.al.	2411.05593	null
2024-11-07	SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models	Muyang Li et.al.	2411.05007	link
2024-11-07	HourVideo: 1-Hour Video-Language Understanding	Keshigeyan Chandrasegaran et.al.	2411.04998	link
2024-11-07	Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives	Hao Sun et.al.	2411.04991	link
2024-11-07	DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion	Wenqiang Sun et.al.	2411.04928	null
2024-11-07	StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration	Panwen Hu et.al.	2411.04925	null
2024-11-07	Structure Matters: Dynamic Policy Gradient	Sara Klein et.al.	2411.04913	null
2024-11-07	In the Era of Prompt Learning with Vision-Language Models	Ankit Jha et.al.	2411.04892	null
2024-11-07	Prompt-Guided Internal States for Hallucination Detection of Large Language Models	Fujie Zhang et.al.	2411.04847	link
2024-11-07	VTechAGP: An Academic-to-General-Audience Text Paraphrase Dataset and Benchmark Models	Ming Cheng et.al.	2411.04825	null
2024-11-07	Learn to Solve Vehicle Routing Problems ASAP: A Neural Optimization Approach for Time-Constrained Vehicle Routing Problems with Finite Vehicle Fleet	Elija Deineko et.al.	2411.04777	null
2024-11-06	Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?	Daniel P. Jeong et.al.	2411.04118	link
2024-11-06	Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset	Alexandre Galashov et.al.	2411.04034	null
2024-11-06	Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages	Aniket Deroy et.al.	2411.04025	null
2024-11-06	Predicting and Publishing Accurate Imbalance Prices Using Monte Carlo Tree Search	Fabio Pavirani et.al.	2411.04011	null
2024-11-06	Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning	Jiawei Yao et.al.	2411.03978	link
2024-11-06	Continuous-Time State Estimation Methods in Robotics: A Survey	William Talbot et.al.	2411.03951	null
2024-11-06	Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks	Felipe Marra et.al.	2411.03948	link
2024-11-06	Can Custom Models Learn In-Context? An Exploration of Hybrid Architecture Performance on In-Context Learning Tasks	Ryan Campbell et.al.	2411.03945	link
2024-11-06	Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models	Minh Duc Bui et.al.	2411.03888	link
2024-11-06	Data Fusion of Synthetic Query Variants With Generative Large Language Models	Timo Breuer et.al.	2411.03881	link
2024-11-05	Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?	Jingyu Xiao et.al.	2411.03292	link
2024-11-05	Proxy-informed Bayesian transfer learning with unknown sources	Sabina J. Sloman et.al.	2411.03263	null
2024-11-05	DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models	Ying Zhou et.al.	2411.03250	null
2024-11-05	On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models	Tariq Berrada Ifriqi et.al.	2411.03177	null
2024-11-05	From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice	Alicia Guo et.al.	2411.03137	null
2024-11-05	MA^2: A Self-Supervised and Motion Augmenting Autoencoder for Gait-Based Automatic Disease Detection	Yiqun Liu et.al.	2411.03129	null
2024-11-05	“Create a Fear of Missing Out” – ChatGPT Implements Unsolicited Deceptive Designs in Generated Websites Without Warning	Veronika Krauß et.al.	2411.03108	null
2024-11-05	Speech Separation with Pretrained Frontend to Minimize Domain Mismatch	Wupeng Wang et.al.	2411.03085	link
2024-11-05	Growing a Tail: Increasing Output Diversity in Large Language Models	Michal Shur-Ofry et.al.	2411.02989	null
2024-11-05	AtlasSeg: Atlas Prior Guided Dual-U-Net for Cortical Segmentation in Fetal Brain MRI	Haoan Xu et.al.	2411.02867	null
2024-11-04	Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Script Languages	Hoang Nguyen et.al.	2411.02398	null
2024-11-04	Training-free Regional Prompting for Diffusion Transformers	Anthony Chen et.al.	2411.02395	link
2024-11-04	Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning	Md Rifat Arefin et.al.	2411.02344	link
2024-11-04	Prospects for optical detections from binary neutron star mergers with the next-generation multi-messenger observatories	E. Loffredo et.al.	2411.02342	link
2024-11-04	PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance	Ruyang Liu et.al.	2411.02327	link
2024-11-04	An Empirical Study on the Code Refactoring Capability of Large Language Models	Jonathan Cordeiro et.al.	2411.02320	null
2024-11-04	Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast	Marilyn Rego et.al.	2411.02318	null
2024-11-04	Defining and Evaluating Physical Safety for Large Language Models	Yung-Chen Tang et.al.	2411.02317	null
2024-11-04	CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments	Kung-Hsiang Huang et.al.	2411.02305	link
2024-11-04	Combining Induction and Transduction for Abstract Reasoning	Wen-Ding Li et.al.	2411.02272	link
2024-10-31	DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion	Weicai Ye et.al.	2410.24203	link
2024-10-31	Redefining in Dictionary: Towards a Enhanced Semantic Understanding of Creative Generation	Fu Feng et.al.	2410.24160	null
2024-10-31	Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age	Nouar AlDahoul et.al.	2410.24148	null
2024-10-31	COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes	Muhammad Ali et.al.	2410.24139	link
2024-10-31	Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing	Akash Dhruv et.al.	2410.24119	link
2024-10-31	AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization	Amir Kazemi et.al.	2410.24116	null
2024-10-31	In-Context Fine-Tuning for Time-Series Foundation Models	Abhimanyu Das et.al.	2410.24087	null
2024-10-31	Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs	Muhammed Saeed et.al.	2410.24049	null
2024-10-31	Handwriting Recognition in Historical Documents with Multimodal LLM	Lucian Li et.al.	2410.24034	null
2024-10-31	Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks	Yingzhe Peng et.al.	2410.24032	null
2024-10-30	RelationBooth: Towards Relation-Aware Customized Object Generation	Qingyu Shi et.al.	2410.23280	null
2024-10-30	SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation	Yining Hong et.al.	2410.23277	null
2024-10-30	EMMA: End-to-End Multimodal Model for Autonomous Driving	Jyh-Jing Hwang et.al.	2410.23262	null
2024-10-30	Evaluating Cultural and Social Awareness of LLM Web Agents	Haoyi Qiu et.al.	2410.23252	null
2024-10-30	ProTransformer: Robustify Transformers via Plug-and-Play Paradigm	Zhichao Hou et.al.	2410.23182	link
2024-10-30	ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning	Millennium Bismay et.al.	2410.23180	link
2024-10-31	Why Gradient Subspace? Identifying and Mitigating LoRA’s Bottlenecks in Federated Fine-Tuning of Large Language Models	Navyansh Mahla et.al.	2410.23111	null
2024-10-30	PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via Existing MLLM Structures	Tianxiang Wu et.al.	2410.23089	null
2024-10-30	BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference	Junqi Zhao et.al.	2410.23079	link
2024-10-30	Toward Understanding In-context vs. In-weight Learning	Bryan Chan et.al.	2410.23042	null
2024-10-29	Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier	Kai Wang et.al.	2410.22317	link
2024-10-29	Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving	Bo Jiang et.al.	2410.22313	link
2024-10-29	Embedding-based classifiers can detect prompt injection attacks	Md. Ahsan Ayub et.al.	2410.22284	link
2024-10-29	Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models	Renzhe Yu et.al.	2410.22282	null
2024-10-29	NCA-Morph: Medical Image Registration with Neural Cellular Automata	Amin Ranem et.al.	2410.22265	link
2024-10-29	FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation	Farima Fatahi Bayat et.al.	2410.22257	null
2024-10-29	ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising	Ashutosh Chaubey et.al.	2410.22233	link
2024-10-29	Synthetic Data Generation with Large Language Models for Personalized Community Question Answering	Marco Braga et.al.	2410.22182	link
2024-10-29	Benchmarking LLM Guardrails in Handling Multilingual Toxicity	Yahan Yang et.al.	2410.22153	null
2024-10-29	AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts	Vishal Kumar et.al.	2410.22143	null
2024-10-28	Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context	Manuel Benavent-Lledo et.al.	2410.21275	link
2024-10-28	Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics	Yaniv Nikankin et.al.	2410.21272	link
2024-10-28	LoRA vs Full Fine-tuning: An Illusion of Equivalence	Reece Shuttleworth et.al.	2410.21228	null
2024-10-28	Exploring contextual modeling with linear complexity for point cloud segmentation	Yong Xien Chng et.al.	2410.21211	null
2024-10-28	Simplest Mechanism Builder Algorithm (SiMBA): An Automated Microkinetic Model Discovery Tool	Miguel Ángel de Carvalho Servia et.al.	2410.21205	link
2024-10-28	CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants	Lize Alberts et.al.	2410.21159	link
2024-10-28	Palisade – Prompt Injection Detection Framework	Sahasra Kokkula et.al.	2410.21146	null
2024-10-28	Do LLMs generate test oracles that capture the actual or the expected program behaviour?	Michael Konstantinou et.al.	2410.21136	null
2024-10-28	KA $^2$ ER: Knowledge Adaptive Amalgamation of ExpeRts for Medical Images Segmentation	Shangde Gao et.al.	2410.21085	null
2024-10-28	Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring	Honglin Mu et.al.	2410.21083	null
2024-10-25	Model merging with SVD to tie the Knots	George Stoica et.al.	2410.19735	link
2024-10-25	Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks	Yinglun Xu et.al.	2410.19705	null
2024-10-25	Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs	Yifei Zhang et.al.	2410.19694	null
2024-10-25	AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs	Clemencia Siro et.al.	2410.19692	null
2024-10-25	Planning-Aware Diffusion Networks for Enhanced Motion Forecasting in Autonomous Driving	Liu Yunhao et.al.	2410.19639	null
2024-10-25	GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing	Hosam Elgendy et.al.	2410.19552	link
2024-10-25	CloserMusicDB: A Modern Multipurpose Dataset of High Quality Music	Aleksandra Piekarzewicz et.al.	2410.19540	null
2024-10-25	Optimization with First Order Algorithms	Charles Dossal et.al.	2410.19506	null
2024-10-25	Introducing MAPO: Momentum-Aided Gradient Descent Prompt Optimization	Anthony Cui et.al.	2410.19499	null
2024-10-25	A Debate-Driven Experiment on LLM Hallucinations and Accuracy	Ray Li et.al.	2410.19485	null
2024-10-24	Unbounded: A Generative Infinite Game of Character Life Simulation	Jialu Li et.al.	2410.18975	null
2024-10-24	ConceptDrift: Uncovering Biases through the Lens of Foundational Models	Cristian Daniel Păduraru et.al.	2410.18970	null
2024-10-24	Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	Zhangheng Li et.al.	2410.18967	null
2024-10-24	On the Crucial Role of Initialization for Matrix Factorization	Bingcong Li et.al.	2410.18965	null
2024-10-24	Learning to Look: Seeking Information for Decision Making via Policy Factorization	Shivin Dass et.al.	2410.18964	null
2024-10-24	Context is Key: A Benchmark for Forecasting with Essential Textual Information	Andrew Robert Williams et.al.	2410.18959	link
2024-10-24	BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning	Yujuan Velvin Fu et.al.	2410.18955	null
2024-10-24	From Blind Solvers to Logical Thinkers: Benchmarking LLMs’ Logical Integrity on Faulty Mathematical Problems	A M Muntasir Rahman et.al.	2410.18921	null
2024-10-25	A Survey on Speech Large Language Models	Jing Peng et.al.	2410.18908	null
2024-10-24	PRISM: A Methodology for Auditing Biases in Large Language Models	Leif Azzopardi et.al.	2410.18906	link
2024-10-23	TP-Eval: Tap Multimodal LLMs’ Potential in Evaluation by Customizing Prompts	Yuxuan Xie et.al.	2410.18071	null
2024-10-23	Disordered charge density waves in the kagome metal FeGe	Hengxin Tan et.al.	2410.18063	null
2024-10-23	CLEAR: Character Unlearning in Textual and Visual Modalities	Alexey Dontsov et.al.	2410.18057	null
2024-10-23	Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases	Anna Glazkova et.al.	2410.18040	null
2024-10-23	MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning	Jingfan Zhang et.al.	2410.18035	null
2024-10-23	Measurements of $ψ{(2S)}$ and $χ_{c1}(3872)$ production within fully reconstructed jets	LHCb collaboration et.al.	2410.18018	null
2024-10-23	Scalable Ranked Preference Optimization for Text-to-Image Generation	Shyamgopal Karthik et.al.	2410.18013	null
2024-10-23	Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation	Suho Kang et.al.	2410.18001	link
2024-10-23	An evolutionary game theory approach to modeling behavioral interaction in disclosing infection begins with an outbreak: COVID-19 as an example	Pranav Verma et.al.	2410.17996	null
2024-10-23	Closed-form merging of parameter-efficient modules for Federated Continual Learning	Riccardo Salami et.al.	2410.17961	null
2024-10-22	Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods	Tsachi Blau et.al.	2410.17222	null
2024-10-22	Hierarchical Upper Confidence Bounds for Constrained Online Learning	Ali Baheri et.al.	2410.17216	null
2024-10-22	YOLO-TS: Real-Time Traffic Sign Detection with Enhanced Accuracy Using Optimized Receptive Fields and Anchor-Free Fusion	Junzhou Chen et.al.	2410.17144	null
2024-10-22	PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles	Li Siyan et.al.	2410.17127	link
2024-10-22	Enhancing Answer Attribution for Faithful Text Generation with Large Language Models	Juraj Vladika et.al.	2410.17112	null
2024-10-23	Optimal Design for Reward Modeling in RLHF	Antoine Scheid et.al.	2410.17055	null
2024-10-22	Insights on Disagreement Patterns in Multimodal Safety Perception across Diverse Rater Groups	Charvi Rastogi et.al.	2410.17032	null
2024-10-23	GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks	Shuyang Hou et.al.	2410.17031	null
2024-10-22	SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine	Xiaochen Wang et.al.	2410.17021	null
2024-10-22	LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices	Chuntao Ding et.al.	2410.16954	link
2024-10-21	SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree	Shuangrui Ding et.al.	2410.16268	link
2024-10-21	MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report	Samrajya Thapa et.al.	2410.16239	link
2024-10-21	Building A Coding Assistant via the Retrieval-Augmented Language Model	Xinze Li et.al.	2410.16229	link
2024-10-21	Theoretical Limitations of Ensembles in the Age of Overparameterization	Niclas Dern et.al.	2410.16201	null
2024-10-21	From Tokens to Materials: Leveraging Language Models for Scientific Discovery	Yuwei Wan et.al.	2410.16165	link
2024-10-21	An Explainable Contrastive-based Dilated Convolutional Network with Transformer for Pediatric Pneumonia Detection	Chandravardhan Singh Raghaw et.al.	2410.16143	null
2024-10-21	Beyond 2:4: exploring V:N:M sparsity for efficient transformer inference on GPUs	Kang Zhao et.al.	2410.16135	null
2024-10-21	Do LLMs write like humans? Variation in grammatical and rhetorical styles	Alex Reinhart et.al.	2410.16107	null
2024-10-21	Analysing the Residual Stream of Language Models Under Knowledge Conflicts	Yu Zhao et.al.	2410.16090	null
2024-10-21	Rolling the DICE on Idiomaticity: How LLMs Fail to Grasp Context	Maggie Mi et.al.	2410.16069	null
2024-10-18	MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps	Xiongtao Zhou et.al.	2410.14668	link
2024-10-18	DiscoGraMS: Enhancing Movie Screen-Play Summarization using Movie Character-Aware Discourse Graph	Maitreya Prafulla Chitale et.al.	2410.14666	null
2024-10-18	GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings	Raghuveer Thirukovalluru et.al.	2410.14635	link
2024-10-18	CELI: Controller-Embedded Language Model Interactions	Jan-Samuel Wagner et.al.	2410.14627	null
2024-10-18	DiSCo Meets LLMs: A Unified Approach for Sparse Retrieval and Contextual Distillation in Conversational Search	Simon Lupart et.al.	2410.14609	link
2024-10-18	Neural Combinatorial Clustered Bandits for Recommendation Systems	Baran Atalar et.al.	2410.14586	null
2024-10-18	Do LLMs “know” internally when they follow instructions?	Juyeon Heo et.al.	2410.14516	link
2024-10-18	CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection	Andrea Appiani et.al.	2410.14509	null
2024-10-18	Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models	Cody Clop et.al.	2410.14479	null
2024-10-18	An abstract structure determines the contextuality degree of observable-based Kochen-Specker proofs	Axel Muller et.al.	2410.14463	null
2024-10-17	Can MLLMs Understand the Deep Implication Behind Chinese Images?	Chenhao Zhang et.al.	2410.13854	link
2024-10-17	AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents	Ke Yang et.al.	2410.13825	null
2024-10-17	ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution	Junhao Gu et.al.	2410.13807	null
2024-10-17	PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment	Zekun Moore Wang et.al.	2410.13785	null
2024-10-17	Aggregation Artifacts in Subjective Tasks Collapse Large Language Models’ Posteriors	Georgios Chochlakis et.al.	2410.13776	null
2024-10-17	Improving Multi-modal Large Language Model through Boosting Vision Capabilities	Yanpeng Sun et.al.	2410.13733	null
2024-10-17	Persistent Pre-Training Poisoning of LLMs	Yiming Zhang et.al.	2410.13722	null
2024-10-17	Jailbreaking LLM-Controlled Robots	Alexander Robey et.al.	2410.13691	null
2024-10-17	Label-free prediction of fluorescence markers in bovine satellite cells using deep learning	Sania Sinha et.al.	2410.13685	null
2024-10-18	Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion	Yijun Liang et.al.	2410.13674	link
2024-10-16	Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media	Ross Deans Kristensen-McLachlan et.al.	2410.12791	null
2024-10-16	Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models	Ce Zhang et.al.	2410.12790	link
2024-10-16	JudgeBench: A Benchmark for Evaluating LLM-based Judges	Sijun Tan et.al.	2410.12784	link
2024-10-16	Context-Scaling versus Task-Scaling in In-Context Learning	Amirhesam Abedsoltan et.al.	2410.12783	null
2024-10-16	SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation	Jaehong Yoon et.al.	2410.12761	null
2024-10-16	How Does Variance Shape the Regret in Contextual Bandits?	Zeyu Jia et.al.	2410.12713	null
2024-10-16	Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization	Xingqi Wang et.al.	2410.12700	link
2024-10-17	Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2	Mohamad Abdi et.al.	2410.12686	null
2024-10-17	Context Matters: Leveraging Contextual Features for Time Series Forecasting	Sameep Chattopadhyay et.al.	2410.12672	null
2024-10-16	CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training	Zhiyuan Ma et.al.	2410.12595	null
2024-10-15	KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities	Hsin-Ping Huang et.al.	2410.11824	null
2024-10-15	SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing	Zhiyuan Zhang et.al.	2410.11815	null
2024-10-15	Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability	Tsz Ting Chung et.al.	2410.11786	null
2024-10-15	On the Training Convergence of Transformers for In-Context Classification	Wei Shen et.al.	2410.11778	null
2024-10-15	SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding	Ying Chen et.al.	2410.11761	null
2024-10-15	Identification and modelling of optically thin inverse Compton scattering in the prompt emission of GRB131014A	Pragyan Pratim Bordoloi et.al.	2410.11753	null
2024-10-15	Personas with Attitudes: Controlling LLMs for Diverse Data Annotation	Leon Fröhling et.al.	2410.11745	link
2024-10-15	RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation	Anton Antonov et.al.	2410.11722	link
2024-10-15	Adaptive Coordinators and Prompts on Heterogeneous Graphs for Cross-Domain Recommendations	Hengyu Zhang et.al.	2410.11719	null
2024-10-15	It’s Just Another Day: Unique Video Captioning by Discriminative Prompting	Toby Perrett et.al.	2410.11702	null
2024-10-14	Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models	Jingzhi Bao et.al.	2410.10821	link
2024-10-14	Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free	Ziyue Li et.al.	2410.10814	link
2024-10-14	Denial-of-Service Poisoning Attacks against Large Language Models	Kuofeng Gao et.al.	2410.10760	link
2024-10-14	Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification	Jan Cegin et.al.	2410.10756	link
2024-10-14	FlexGen: Flexible Multi-View Generation from Text and Image Inputs	Xinli Xu et.al.	2410.10745	null
2024-10-14	SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing	Pengrui Quan et.al.	2410.10741	link
2024-10-14	Large Language Models Are Active Critics in NLG Evaluation	Shuying Xu et.al.	2410.10724	null
2024-10-15	4-LEGS: 4D Language Embedded Gaussian Splatting	Gal Fiebelman et.al.	2410.10719	null
2024-10-14	Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues	Qibing Ren et.al.	2410.10700	link
2024-10-14	Functional Flexibility in Generative AI Interfaces: Text Editing with LLMs through Conversations, Toolbars, and Prompts	Florian Lehmann et.al.	2410.10644	null
2024-10-11	AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation	Zijun Wang et.al.	2410.09040	link
2024-10-11	Mentor-KD: Making Small Language Models Better Multi-step Reasoners	Hojae Lee et.al.	2410.09037	link
2024-10-11	AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents	Maksym Andriushchenko et.al.	2410.09024	null
2024-10-11	Parameter-Efficient Fine-Tuning of State Space Models	Kevin Galim et.al.	2410.09016	link
2024-10-11	The Impact of Visual Information in Chinese Characters: Evaluating Large Models’ Ability to Recognize and Utilize Radicals	Xiaofeng Wu et.al.	2410.09013	null
2024-10-11	Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models	Hao Li et.al.	2410.09012	link
2024-10-11	Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory	Rebecca M. M. Hicke et.al.	2410.08991	link
2024-10-11	Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements	Jingyu Zhang et.al.	2410.08968	null
2024-10-11	Exploring the Design Space of Cognitive Engagement Techniques with AI-Generated Code for Enhanced Learning	Majeed Kazemitabaar et.al.	2410.08922	null
2024-10-11	Utilizing ChatGPT in a Data Structures and Algorithms Course: A Teaching Assistant’s Perspective	Pooriya Jamie et.al.	2410.08899	null
2024-10-10	LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts	Anh-Quan Cao et.al.	2410.08211	null
2024-10-10	HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation	Shanyan Guan et.al.	2410.08192	null
2024-10-10	SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation	Hang Yin et.al.	2410.08189	null
2024-10-10	Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs	Xiaoyuan Liu et.al.	2410.08145	link
2024-10-10	Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks	Mathis Pink et.al.	2410.08133	null
2024-10-10	Think Beyond Size: Dynamic Prompting for More Effective Reasoning	Kamesh R et.al.	2410.08130	null
2024-10-10	What Makes Large Language Models Reason in (Multi-Turn) Code Generation?	Kunhao Zheng et.al.	2410.08105	null
2024-10-10	Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models	Wenting Tan et.al.	2410.08068	link
2024-10-10	Reversible Decoupling Network for Single Image Reflection Removal	Hao Zhao et.al.	2410.08063	link
2024-10-10	Closing the Loop: Learning to Generate Writing Feedback via Language Model Simulated Student Revisions	Inderjeet Nair et.al.	2410.08058	link
2024-10-09	MM-Ego: Towards Building Egocentric Multimodal LLMs	Hanrong Ye et.al.	2410.07177	null
2024-10-09	One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation	Fabian Paischer et.al.	2410.07170	link
2024-10-09	AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation	Yukang Cao et.al.	2410.07164	null
2024-10-09	InstructG2I: Synthesizing Images from Multimodal Attributed Graphs	Bowen Jin et.al.	2410.07157	link
2024-10-09	VHELM: A Holistic Evaluation of Vision Language Models	Tony Lee et.al.	2410.07112	link
2024-10-09	I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy	Gian Maria Campedelli et.al.	2410.07109	link
2024-10-09	Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context	Sangwon Yu et.al.	2410.07103	null
2024-10-09	Robots in the Middle: Evaluating LLMs in Dispute Resolution	Jinzhe Tan et.al.	2410.07053	null
2024-10-09	PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness	Zekun Wang et.al.	2410.07035	null
2024-10-09	Modeling of the Gamma Ray Burst photospheric emission: Monte Carlo simulation of the GRB prompt emission, numerical results and discussion	Amina Trabelsi et.al.	2410.07005	link
2024-10-07	GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting	Yukang Cao et.al.	2410.05259	null
2024-10-08	TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models	Rabin Adhikari et.al.	2410.05239	link
2024-10-07	Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer	Siyuan Hou et.al.	2410.05151	null
2024-10-08	PAMLR: A Passive-Active Multi-Armed Bandit-Based Solution for LoRa Channel Allocation	Jihoon Yun et.al.	2410.05147	null
2024-10-07	CR-CTC: Consistency regularization on CTC for improved speech recognition	Zengwei Yao et.al.	2410.05101	link
2024-10-07	IGroupSS-Mamba: Interval Group Spatial-Spectral Mamba for Hyperspectral Image Classification	Yan He et.al.	2410.05100	null
2024-10-07	Human-in-the-loop Reasoning For Traffic Sign Detection: Collaborative Approach Yolo With Video-llava	Mehdi Azarafza et.al.	2410.05096	null
2024-10-07	HyperINF: Unleashing the HyperPower of the Schulz’s Method for Data Influence Estimation	Xinyu Zhou et.al.	2410.05090	link
2024-10-07	ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery	Ziru Chen et.al.	2410.05080	null
2024-10-07	Large Language Model Based Multi-Objective Optimization for Integrated Sensing and Communications in UAV Networks	Haoyun Li et.al.	2410.05062	null
2024-10-04	Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models	Tinghui Zhu et.al.	2410.03659	link
2024-10-04	Conditional Enzyme Generation Using Protein Language Models with Adapters	Jason Yang et.al.	2410.03634	null
2024-10-04	Searching for type I seesaw mechanism in a two Heavy Neutral Leptons scenario at FCC-ee	Sehar Ajmal et.al.	2410.03615	null
2024-10-04	Understanding Reasoning in Chain-of-Thought from the Hopfieldian View	Lijie Hu et.al.	2410.03595	null
2024-10-04	Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models	Xin Zou et.al.	2410.03577	link
2024-10-04	Individual vaccination as Nash equilibrium in a SIR model with application to the 2009-10 Influenza A(H1N1) epidemic in France	Laetitia Laguzet et.al.	2410.03567	null
2024-10-04	Re-examining Sexism and Misogyny Classification with Annotator Attitudes	Aiqi Jiang et.al.	2410.03543	null
2024-10-04	Collaborative and Efficient Personalization with Mixtures of Adaptors	Abdulla Jasem Almansoori et.al.	2410.03497	null
2024-10-04	Gradient-based Jailbreak Images for Multimodal Fusion Models	Javier Rando et.al.	2410.03489	link
2024-10-04	Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation	Tobias Leemann et.al.	2410.03461	null
2024-10-03	Erasing Conceptual Knowledge from Language Models	Rohit Gandikota et.al.	2410.02760	link
2024-10-03	Loong: Generating Minute-level Long Videos with Autoregressive Language Models	Yuqing Wang et.al.	2410.02757	null
2024-10-03	CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation	Han He et.al.	2410.02748	link
2024-10-03	Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization	Lei Xu et.al.	2410.02741	link
2024-10-03	Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation	Rohin Manvi et.al.	2410.02725	null
2024-10-03	Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization	Ryan C. Barron et.al.	2410.02721	null
2024-10-03	HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly	Howard Yen et.al.	2410.02694	link
2024-10-03	HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router	Lingrui Mei et.al.	2410.02684	link
2024-10-03	DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life	Yu Ying Chiu et.al.	2410.02683	null
2024-10-03	Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models	Shuoyuan Wang et.al.	2410.02681	null
2024-10-02	DreamGarden: A Designer Assistant for Growing Games from a Single Prompt	Sam Earle et.al.	2410.01791	null
2024-10-02	Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models	Shayekh Bin Islam et.al.	2410.01782	link
2024-10-02	Mimicking Human Intuition: Cognitive Belief-Driven Q-Learning	Xingrui Gu et.al.	2410.01739	null
2024-10-02	LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits	Duy Nguyen et.al.	2410.01735	link
2024-10-02	ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation	Rinon Gal et.al.	2410.01731	null
2024-10-02	Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting	Longyu Feng et.al.	2410.01724	null
2024-10-02	Examining the Role of Relationship Alignment in Large Language Models	Kristen M. Altenburger et.al.	2410.01708	null
2024-10-02	FactAlign: Long-form Factuality Alignment of Large Language Models	Chao-Wei Huang et.al.	2410.01691	link
2024-10-02	Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding	Yanming Liu et.al.	2410.01671	null
2024-10-02	Extending Contextual Self-Modulation: Meta-Learning Across Modalities, Task Dimensionalities, and Data Regimes	Roussel Desmond Nzoyem et.al.	2410.01655	link
2024-09-30	LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner	Xiaopan Zhang et.al.	2409.20560	null
2024-09-30	Uni $^2$ Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection	Yubin Wang et.al.	2409.20558	null
2024-09-30	LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation	Ziyao Zhang et.al.	2409.20550	link
2024-09-30	Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models	Arpan Mukherjee et.al.	2409.20512	null
2024-09-30	COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models	Divyanshu Daiya et.al.	2409.20502	null
2024-09-30	Online Decision Deferral under Budget Constraints	Mirabel Reid et.al.	2409.20489	link
2024-10-01	Instance-adaptive Zero-shot Chain-of-Thought Prompting	Xiaosong Yuan et.al.	2409.20441	null
2024-09-30	World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering	Jiacong Wang et.al.	2409.20424	link
2024-09-30	Superposition of PRS and PDSCH for ISAC System: Spectral Efficiency Enhancement and Range Ambiguity Elimination	Keivan Khosroshahi et.al.	2409.20420	null
2024-09-30	Wait, but Tylenol is Acetaminophen… Investigating and Improving Language Models’ Ability to Resist Requests for Misinformation	Shan Chen et.al.	2409.20385	null
2024-09-27	ProMerge: Prompt and Merge for Unsupervised Instance Segmentation	Dylan Li et.al.	2409.18961	null
2024-09-27	LML: Language Model Learning a Dataset for Data-Augmented Prediction	Praneeth Vadlapati et.al.	2409.18957	link
2024-09-27	Improving Visual Object Tracking through Visual Prompting	Shih-Fang Chen et.al.	2409.18901	link
2024-09-27	IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation	Fan Lin et.al.	2409.18892	link
2024-09-27	LW2G: Learning Whether to Grow for Prompt-based Continual Learning	Qian Feng et.al.	2409.18860	link
2024-09-27	Text2FX: Harnessing CLAP Embeddings for Text-Guided Audio Effects	Annie Chu et.al.	2409.18847	null
2024-09-27	LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis	Hamed Babaei Giglou et.al.	2409.18812	link
2024-09-27	Can AI Enhance its Creativity to Beat Humans ?	Anne-Gaëlle Maltese et.al.	2409.18776	null
2024-09-27	Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations	James Ford et.al.	2409.18764	null
2024-09-27	Interaction Equivalence	Beniamino Accattoli et.al.	2409.18709	null
2024-09-26	EgoLM: Multi-Modal Language Model of Egocentric Motions	Fangzhou Hong et.al.	2409.18127	null
2024-09-26	GSON: A Group-based Social Navigation Framework with Large Multimodal Model	Shangyi Luo et.al.	2409.18084	null
2024-09-26	Infer Human’s Intentions Before Following Natural Language Instructions	Yanming Wan et.al.	2409.18073	link
2024-09-26	Infering Alt-text For UI Icons With Large Language Models During App Development	Sabrina Haque et.al.	2409.18060	null
2024-09-26	MARS: Multi-radio Architecture with Radio Selection using Decision Trees for emerging mesoscale CPS/IoT applications	Jothi Prasanna Shanmuga Sundaram et.al.	2409.18043	null
2024-09-26	DARE: Diverse Visual Question Answering with Robustness Evaluation	Hannah Sterz et.al.	2409.18023	null
2024-09-26	Control Industrial Automation System with Large Language Models	Yuchen Xia et.al.	2409.18009	link
2024-09-26	Enhancing Tourism Recommender Systems for Sustainable City Trips Using Retrieval-Augmented Generation	Ashmi Banerjee et.al.	2409.18003	null
2024-09-26	Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models	Georg Ahnert et.al.	2409.17990	link
2024-09-26	GRB 240529A: A Tale of Two Shocks	Tian-Rui Sun et.al.	2409.17983	null
2024-09-25	Attention Prompting on Image for Large Vision-Language Models	Runpeng Yu et.al.	2409.17143	link
2024-09-25	Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset	Andrew Goldberg et.al.	2409.17126	null
2024-09-26	Characterizing stable regions in the residual stream of LLMs	Jett Janiak et.al.	2409.17113	null
2024-09-25	Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts	Mohammad Sadil Khan et.al.	2409.17106	link
2024-09-25	Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation	Richard D. Paul et.al.	2409.17085	null
2024-09-25	Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors	Aiping Zhang et.al.	2409.17058	link
2024-09-25	GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design	Phillip Mueller et.al.	2409.17045	null
2024-09-25	Counterfactual Token Generation in Large Language Models	Ivi Chatzi et.al.	2409.17027	link
2024-09-25	AXCEL: Automated eXplainable Consistency Evaluation using LLMs	P Aditya Sreekar et.al.	2409.16984	null
2024-09-25	DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling	Kyuheon Jung et.al.	2409.16949	link
2024-09-24	Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation	Yong Xien Chng et.al.	2409.16278	null
2024-09-24	Second Order Bounds for Contextual Bandits with Function Approximation	Aldo Pacchiano et.al.	2409.16197	null
2024-09-24	Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation	Xiaohong Liu et.al.	2409.16183	null
2024-09-24	Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering	Ziyu Zhao et.al.	2409.16167	null
2024-09-24	Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework	Lu Chen et.al.	2409.16146	link
2024-09-24	HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection	Yuqi Ma et.al.	2409.16136	null
2024-09-24	Evaluation of state-of-the-art ASR Models in Child-Adult Interactions	Aditya Ashvin et.al.	2409.16135	null
2024-09-24	MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents	Ming Zhu et.al.	2409.16120	link
2024-09-24	Exploring Hint Generation Approaches in Open-Domain Question Answering	Jamshid Mozafari et.al.	2409.16096	link
2024-09-24	MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models	Wenhao Yu et.al.	2409.16030	null
2024-09-18	To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning	Zayne Sprague et.al.	2409.12183	link
2024-09-18	Investigating the effects of precise mass measurements of Ru and Pd isotopes on machine learning mass modeling	W. S. Porter et.al.	2409.12141	null
2024-09-18	MoRAG – Multi-Fusion Retrieval Augmented Generation for Human Motion	Kalakonda Sai Shashank et.al.	2409.12140	link
2024-09-18	Self-similar solutions of oscillatory reconnection: parameter study of magnetic field strength and background temperature	Luiz A. C. A. Schiavo et.al.	2409.12130	null
2024-09-18	Fully charmed tetraquark production at the LHC experiments	Ilia Belov et.al.	2409.12070	null
2024-09-18	Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking	Ningyuan Xi et.al.	2409.12059	null
2024-09-19	Using Large Language Models to Generate Clinical Trial Tables and Figures	Yumeng Yang et.al.	2409.12046	null
2024-09-18	Mixture of Prompt Learning for Vision Language Models	Yu Du et.al.	2409.12011	null
2024-09-18	Ramp reversal memory in bulk crystals of 1T-TaS2	Avital Fried et.al.	2409.11977	null
2024-09-18	Sampling Latent Material-Property Information From LLM-Derived Embedding Representations	Luke P. J. Gilligan et.al.	2409.11971	null
2024-09-17	LPT++: Efficient Training on Mixture of Long-tailed Experts	Bowen Dong et.al.	2409.11323	null
2024-09-17	MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping	Amirreza Fateh et.al.	2409.11316	link
2024-09-17	Beyond LoRA: Exploring Efficient Fine-Tuning Techniques for Time Series Foundational Models	Divij Gupta et.al.	2409.11302	null
2024-09-17	TISIS : Trajectory Indexing for SImilarity Search	Sara Jarrad et.al.	2409.11301	null
2024-09-18	Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling	Xinyue Fang et.al.	2409.11283	null
2024-09-17	Machine Learning and Theory Ladenness – A Phenomenological Account	Alberto Termine et.al.	2409.11277	null
2024-09-18	The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives	Samee Arif et.al.	2409.11261	link
2024-09-17	Norm of Mean Contextualized Embeddings Determines their Variance	Hiroaki Yamagiwa et.al.	2409.11253	link
2024-09-17	Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse	Maojia Song et.al.	2409.11242	link
2024-09-17	Multimodal Attention-Enhanced Feature Fusion-based Weekly Supervised Anomaly Violence Detection	Yuta Kaneko et.al.	2409.11223	null
2024-09-16	Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models	Momoko Shiraishi et.al.	2409.10506	null
2024-09-16	Do Pre-trained Vision-Language Models Encode Object States?	Kaleb Newman et.al.	2409.10488	link
2024-09-16	Addressing misspecification in contextual optimization	Omar Bennouna et.al.	2409.10479	null
2024-09-16	A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration	Zhang Zheng et.al.	2409.10403	null
2024-09-16	Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation	Hanbo Bi et.al.	2409.10389	null
2024-09-16	On Synthetic Texture Datasets: Challenges, Creation, and Curation	Blaine Hoak et.al.	2409.10297	null
2024-09-16	From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs	Navya Jain et.al.	2409.10245	null
2024-09-16	Robust Bird’s Eye View Segmentation by Adapting DINOv2	Merve Rabia Barın et.al.	2409.10228	null
2024-09-16	Exploring Quantum Contextuality with the Quantum Moebius-Escher-Penrose hypergraph	Mirko Navara et.al.	2409.10179	null
2024-09-17	jina-embeddings-v3: Multilingual Embeddings With Task LoRA	Saba Sturua et.al.	2409.10173	null
2024-09-13	Contri(e)ve: Context + Retrieve for Scholarly Question Answering	Kanchan Shivashankar et.al.	2409.09010	null
2024-09-13	SynSUM – Synthetic Benchmark with Structured and Unstructured Medical Records	Paloma Rabaey et.al.	2409.08936	link
2024-09-13	LLM-based Weak Supervision Framework for Query Intent Classification in Video Search	Farnoosh Javadi et.al.	2409.08931	null
2024-09-13	Farmer.Chat: Scaling AI-Powered Agricultural Services for Smallholder Farmers	Namita Singh et.al.	2409.08916	null
2024-09-13	Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing	Minh-Duc Vu et.al.	2409.08885	null
2024-09-13	Data Efficient Child-Adult Speaker Diarization with Simulated Conversations	Anfeng Xu et.al.	2409.08881	link
2024-09-13	InstantDrag: Improving Interactivity in Drag-based Image Editing	Joonghyuk Shin et.al.	2409.08857	null
2024-09-13	A RAG Approach for Generating Competency Questions in Ontology Engineering	Xueli Pan et.al.	2409.08820	null
2024-09-13	Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR	Mingyu Cui et.al.	2409.08797	link
2024-09-13	LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment	Huan Zhang et.al.	2409.08795	link
2024-09-12	Click2Mask: Local Editing with Dynamic Mask Generation	Omer Regev et.al.	2409.08272	link
2024-09-12	Improving Text-guided Object Inpainting with Semantic Pre-inpainting	Yifu Chen et.al.	2409.08260	link
2024-09-12	Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding	Hongyu Li et.al.	2409.08251	null
2024-09-12	OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering	Jiahao Nick Li et.al.	2409.08250	null
2024-09-12	TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder	NaHyeon Park et.al.	2409.08248	link
2024-09-12	LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems	Hakan T. Otal et.al.	2409.08234	link
2024-09-12	Exploring Use and Perceptions of Generative AI Art Tools by Blind Artists	Gayatri Raman et.al.	2409.08226	null
2024-09-12	AudioBERT: Audio Knowledge Augmented Language Model	Hyunjong Ok et.al.	2409.08199	link
2024-09-12	Fine-tuning Large Language Models for Entity Matching	Aaron Steiner et.al.	2409.08185	link
2024-09-12	On the Role of Context in Reading Time Prediction	Andreas Opedal et.al.	2409.08160	link
2024-09-11	Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation	Gavin Butts et.al.	2409.07424	null
2024-09-11	AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge	Han Wang et.al.	2409.07394	link
2024-09-11	Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code	Khiem Ton et.al.	2409.07368	null
2024-09-11	Enhancing Sequential Music Recommendation with Negative Feedback-informed Contrastive Learning	Pavan Seshadri et.al.	2409.07367	null
2024-09-11	PaveSAM Segment Anything for Pavement Distress	Neema Jakisa Owor et.al.	2409.07295	null
2024-09-12	Alignment of Diffusion Models: Fundamentals, Challenges, and Future	Buhua Liu et.al.	2409.07253	link
2024-09-11	Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal Reasoning	Yingling Lu et.al.	2409.07238	link
2024-09-12	3DGCQA: A Quality Assessment Database for 3D AI-Generated Contents	Yingjie Zhou et.al.	2409.07236	link
2024-09-11	Swin-LiteMedSAM: A Lightweight Box-Based Segment Anything Model for Large-Scale Medical Image Datasets	Ruochen Gao et.al.	2409.07172	link
2024-09-11	Leveraging Unstructured Text Data for Federated Instruction Tuning of Large Language Models	Rui Ye et.al.	2409.07136	null
2024-09-10	E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning	Zihan Liao et.al.	2409.06679	null
2024-09-10	SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation	Teng Hu et.al.	2409.06633	null
2024-09-10	One-Shot Imitation under Mismatched Execution	Kushal Kedia et.al.	2409.06615	null
2024-09-10	Simulation-based Scenario Generation for Robust Hybrid AI for Autonomy	Hambisa Keno et.al.	2409.06608	null
2024-09-10	Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System	Leilei Lin et.al.	2409.06568	link
2024-09-10	ChatGPT’s Potential in Cryptography Misuse Detection: A Comparative Analysis with Static Analysis Tools	Ehsan Firouzi et.al.	2409.06561	null
2024-09-10	An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition	Yi-Cheng Wang et.al.	2409.06468	null
2024-09-10	Continual Domain Incremental Learning for Privacy-aware Digital Pathology	Pratibha Kumari et.al.	2409.06455	null
2024-09-10	Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles	Qiujing Lu et.al.	2409.06450	null
2024-09-10	HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data	Hossein Hajipour et.al.	2409.06446	link
2024-09-09	Promptable Closed-loop Traffic Simulation	Shuhan Tan et.al.	2409.05863	null
2024-09-09	Recognizing molecular chirality via twisted 2D materials	Lorenzo Cavicchi et.al.	2409.05839	null
2024-09-09	Are Large Language Models a Threat to Programming Platforms? An Exploratory Study	Md Mustakim Billah et.al.	2409.05824	null
2024-09-09	Leveraging Object Priors for Point Tracking	Bikram Boote et.al.	2409.05786	link
2024-09-09	A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System	B. Sankar et.al.	2409.05747	null
2024-09-09	What Did My Car Say? Autonomous Vehicle Explanation Errors, Context, and Personal Traits Impact Comfort, Reliance, Satisfaction, and Driving Confidence	Robert Kaufman et.al.	2409.05731	null
2024-09-09	Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling	Sara Ferro et.al.	2409.05699	null
2024-09-09	SX-Stitch: An Efficient VMS-UNet Based Framework for Intraoperative Scoliosis X-Ray Image Stitching	Yi Li et.al.	2409.05681	null
2024-09-09	Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models	Aakash Sen Sharma et.al.	2409.05668	null
2024-09-09	DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification	Junzhou Chen et.al.	2409.05587	null
2024-09-06	Question-Answering Dense Video Events	Hangyu Qin et.al.	2409.04388	link
2024-09-06	J/ $ψ$-hadron correlations at midrapidity in pp collisions at $\sqrt{s}$ = 13 TeV	ALICE Collaboration et.al.	2409.04364	null
2024-09-06	Connectivity-Inspired Network for Context-Aware Recognition	Gianluca Carloni et.al.	2409.04360	link
2024-09-06	First studies on cascaded dual-phase liquid hole-multipliers in xenon	G. Martinez-Lema et.al.	2409.04338	null
2024-09-06	Active learning for regression in engineering populations: A risk-informed approach	Daniel R. Clarkson et.al.	2409.04328	null
2024-09-06	Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs	Aliakbar Nafar et.al.	2409.04318	link
2024-09-06	FS-MedSAM2: Exploring the Potential of SAM2 for Few-Shot Medical Image Segmentation without Fine-tuning	Yunhao Bai et.al.	2409.04298	link
2024-09-06	Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets	Desiree Heim et.al.	2409.04286	null
2024-09-06	An overview of domain-specific foundation model: key technologies, applications and challenges	Haolong Chen et.al.	2409.04267	null
2024-09-06	FPT Algorithms using Minimal Parameters for a Generalized Version of Maximin Shares	Klaus Jansen et.al.	2409.04225	null
2024-09-05	LLM-CI: Assessing Contextual Integrity Norms in Language Models	Yan Shvartzshnaider et.al.	2409.03735	null
2024-09-06	RAG based Question-Answering for Contextual Response Prediction System	Sriram Veturi et.al.	2409.03708	null
2024-09-06	LLM-based multi-agent poetry generation in non-cooperative environments	Ran Zhang et.al.	2409.03659	link
2024-09-05	Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers	Amit Ben Artzy et.al.	2409.03621	link
2024-09-05	Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration	Pei Wang et.al.	2409.03455	null
2024-09-05	Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities	Wei Lu et.al.	2409.03444	link
2024-09-05	Leveraging Large Language Models through Natural Language Processing to provide interpretable Machine Learning predictions of mental deterioration in real time	Francisco de Arriba-Pérez et.al.	2409.03375	null
2024-09-05	TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation	Shahzaib Iqbal et.al.	2409.03367	null
2024-09-05	Sketch: A Toolkit for Streamlining LLM Operations	Xin Jiang et.al.	2409.03346	null
2024-09-05	N-gram Prediction and Word Difference Representations for Language Modeling	DongNyeong Heo et.al.	2409.03295	null
2024-09-04	HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts	Xinyu Liu et.al.	2409.02919	link
2024-09-04	Building a Scalable, Effective, and Steerable Search and Ranking Platform	Marjan Celikik et.al.	2409.02856	null
2024-09-04	Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model	Tornike Karchkhadze et.al.	2409.02845	null
2024-09-04	MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark	Xiang Yue et.al.	2409.02813	null
2024-09-04	Non-Orthogonal Multiple-Access Strategies for Direct-to-Satellite IoT Networks	Felipe Augusto Tondo et.al.	2409.02748	null
2024-09-04	Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection	Kaiqing Lin et.al.	2409.02664	null
2024-09-04	PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation	Jun Ling et.al.	2409.02657	null
2024-09-04	Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects	Kyungmin Jo et.al.	2409.02653	null
2024-09-04	Mamba as a motion encoder for robotic imitation learning	Toshiaki Tsuji et.al.	2409.02636	null
2024-09-04	PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation	Aneta Pawelec et.al.	2409.02617	null
2024-08-30	DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model	Mona Sheikh Zeinoddin et.al.	2408.17433	link
2024-08-30	CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models	Jonathan Bourne et.al.	2408.17428	link
2024-09-03	Open-vocabulary Temporal Action Localization using VLMs	Naoki Wake et.al.	2408.17422	null
2024-08-30	MoRe Fine-Tuning with 10x Fewer Parameters	Wenxuan Tan et.al.	2408.17383	link
2024-08-30	Efficient Multi-task Prompt Tuning for Recommendation	Ting Bai et.al.	2408.17214	null
2024-08-30	NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar	Runwei Guan et.al.	2408.17207	null
2024-08-30	Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study	Shubham Agarwal et.al.	2408.17181	null
2024-08-30	Wireless Integrated Authenticated Communication System (WIA-Comm)	Amith N Bharadwaj et.al.	2408.17112	null
2024-08-30	Understanding the User: An Intent-Based Ranking Dataset	Abhijit Anand et.al.	2408.17103	null
2024-08-30	Reasoning AI Performance Degradation in 6G Networks with Large Language Models	Liming Huang et.al.	2408.17097	null
2024-08-29	PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning	Noor Hussein et.al.	2408.16769	link
2024-08-29	SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners	Ziyu Guo et.al.	2408.16768	link
2024-08-29	ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model	Fangfu Liu et.al.	2408.16767	null
2024-08-29	An algebraic characterisation of Kochen-Specker contextuality	Markus Frembs et.al.	2408.16764	null
2024-08-29	Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge	Beidi Dong et.al.	2408.16749	null
2024-08-29	GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models	Moreno D’Incà et.al.	2408.16700	link
2024-08-29	Iterative Graph Alignment	Fangyuan Yu et.al.	2408.16667	link
2024-08-29	LLMs generate structurally realistic social networks but overestimate political homophily	Serina Chang et.al.	2408.16629	link
2024-08-29	WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling	Shengpeng Ji et.al.	2408.16532	link
2024-08-29	UAV-Based Human Body Detector Selection and Fusion for Geolocated Saliency Map Generation	Piotr Rudol et.al.	2408.16501	null
2024-08-29	Spatio-Temporal Context Prompting for Zero-Shot Action Detection	Wei-Jhe Huang et.al.	2408.15996	null
2024-08-28	TEDRA: Text-based Editing of Dynamic and Photoreal Actors	Basavaraj Sunagad et.al.	2408.15995	null
2024-08-28	Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration	Xu Zhang et.al.	2408.15994	null
2024-08-28	In-Context Imitation Learning via Next-Token Prediction	Letian Fu et.al.	2408.15980	link
2024-08-28	Fall Detection for Smart Living using YOLOv5	Gracile Astlin Pereira et.al.	2408.15955	null
2024-08-28	Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games	Nicholas R. Waytowich et.al.	2408.15950	null
2024-08-28	Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models	Yuncheng Yang et.al.	2408.15915	link
2024-08-28	CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization	Feize Wu et.al.	2408.15914	null
2024-08-28	Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models	Sebastian Vallejo Vera et.al.	2408.15895	null
2024-08-28	Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation	Shaofei Huang et.al.	2408.15876	link
2024-08-27	SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images	Zafer Yildiz et.al.	2408.15224	link
2024-08-27	LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet	Nathaniel Li et.al.	2408.15221	null
2024-08-27	Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation	Jian Hu et.al.	2408.15205	link
2024-08-27	On the parameterized complexity of computing good edge-labelings	Davi de Andrade et.al.	2408.15181	null
2024-08-27	A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships	Gracile Astlin Pereira et.al.	2408.15178	null
2024-08-27	X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation	Hanjia Lyu et.al.	2408.15172	null
2024-08-28	Urdu Digital Text Word Optical Character Recognition Using Permuted Auto Regressive Sequence Modeling	Ahmed Mustafa et.al.	2408.15119	null
2024-08-27	CLIP-AGIQA: Boosting the Performance of AI-Generated Image Quality Assessment with CLIP	Zhenchen Tang et.al.	2408.15098	null
2024-08-27	MiWaves Reinforcement Learning Algorithm	Susobhan Ghosh et.al.	2408.15076	link
2024-08-28	Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance	Kunpeng Wang et.al.	2408.15063	link
2024-08-27	Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models	Aradhye Agarwal et.al.	2408.14470	link
2024-08-26	Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study	Liuchang Xu Shuo Zhao et.al.	2408.14438	null
2024-08-26	Social perception of faces in a vision-language model	Carina I. Hausladen et.al.	2408.14435	link
2024-08-26	Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications	Luyue Xu et.al.	2408.14432	null
2024-08-26	Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning	Sakhinana Sagar Srinivas et.al.	2408.14387	null
2024-08-26	ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty	Xindi Wu et.al.	2408.14339	null
2024-08-26	Claim Verification in the Age of Large Language Models: A Survey	Alphaeus Dmonte et.al.	2408.14317	null
2024-08-27	Text3DAug – Prompted Instance Augmentation for LiDAR Perception	Laurenz Reichardt et.al.	2408.14253	link
2024-08-27	SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher	Trung Dao et.al.	2408.14176	link
2024-08-26	Contrastive Learning Subspace for Text Clustering	Qian Yong et.al.	2408.14119	null
2024-08-23	Domain-specific long text classification from sparse relevant information	Célia D’Cruz et.al.	2408.13253	null
2024-08-23	LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation	Shuai Yang et.al.	2408.13252	null
2024-08-23	CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities	Tao Wu et.al.	2408.13239	link
2024-08-23	Enhancing Few-Shot Transfer Learning with Optimized Multi-Task Prompt Tuning through Modular Prompt Composition	Ahmad Pouramini et.al.	2408.13227	null
2024-08-23	Polarization Measurement of Gamma-ray Bursts with Fermi-GBM: The Case of GRB 180720B	P. Veres et.al.	2408.13199	null
2024-08-23	Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning	Hourui Deng et.al.	2408.13184	null
2024-08-23	Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation	Bonan Li et.al.	2408.13149	null
2024-08-23	SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks	Kai-Wei Chang et.al.	2408.13040	null
2024-08-23	Indoor scene recognition from images under visual corruptions	Willams de Lima Costa et.al.	2408.13029	null
2024-08-23	A Web-Based Solution for Federated Learning with LLM-Based Automation	Chamith Mawela et.al.	2408.13010	null
2024-08-22	Controllable Text Generation for Large Language Models: A Survey	Xun Liang et.al.	2408.12599	link
2024-08-23	Non-Homophilic Graph Pre-Training and Prompt Learning	Xingtong Yu et.al.	2408.12594	link
2024-08-22	Contextual Stochastic Optimization for School Desegregation Policymaking	Hongzhao Guan et.al.	2408.12572	null
2024-08-22	Towards Evaluating and Building Versatile Large Language Models for Medicine	Chaoyi Wu et.al.	2408.12547	link
2024-08-22	Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition	Bozheng Li et.al.	2408.12475	null
2024-08-22	DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems	Jiaju Chen et.al.	2408.12470	link
2024-08-22	FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing	Jue Wang et.al.	2408.12429	link
2024-08-22	Dynamic Product Image Generation and Recommendation at Scale for Personalized E-commerce	Ádám Tibor Czapp et.al.	2408.12392	null
2024-08-22	Orbits of Binary Stars: from Visual Measures to Speckle Interferometry	Andrei Tokovinin et.al.	2408.12376	null
2024-08-23	RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering	Pratyush Kumar et.al.	2408.12369	link
2024-08-21	NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation	Zhenye Lou et.al.	2408.11787	link
2024-08-21	Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards	Omar Erak et.al.	2408.11775	link
2024-08-21	D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models	M. Forlini et.al.	2408.11761	null
2024-08-21	MambaCSR: Dual-Interleaved Scanning for Compressed Image Super-Resolution With SSMs	Yulin Ren et.al.	2408.11758	link
2024-08-21	FocusLLM: Scaling LLM’s Context by Parallel Decoding	Zhenyu Li et.al.	2408.11745	link
2024-08-21	JieHua Paintings Style Feature Extracting Model using Stable Diffusion with ControlNet	Yujia Gu et.al.	2408.11744	null
2024-08-21	CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering	Yuliang Cai et.al.	2408.11742	link
2024-08-22	LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites	Zachariah Sollenberger et.al.	2408.11729	null
2024-08-21	Efficient Detection of Toxic Prompts in Large Language Models	Yi Liu et.al.	2408.11727	null
2024-08-21	Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests	Amirhossein Deljouyi et.al.	2408.11710	link
2024-08-20	Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement	Satoshi Kosugi et.al.	2408.11055	link
2024-08-20	Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks	Nathaniel Pinckney et.al.	2408.11053	link
2024-08-20	Multiple Topology Replica Exchange of Expanded Ensembles (MT-REXEE) for Multidimensional Alchemical Calculations	Anika J. Friedman et.al.	2408.11038	link
2024-08-20	An Overlooked Role of Context-Sensitive Dendrites	Mohsin Raza et.al.	2408.11019	null
2024-08-20	Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter	Farhanul Haque et.al.	2408.10955	null
2024-08-20	The Evolution of Reinforcement Learning in Quantitative Finance	Nikolaos Pippas et.al.	2408.10932	null
2024-08-20	CHECKWHY: Causal Fact Verification via Argument Structure	Jiasheng Si et.al.	2408.10918	link
2024-08-21	BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model	Yeyong Yu et.al.	2408.10903	link
2024-08-20	DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection	Xinqi Su et.al.	2408.10883	link
2024-08-20	Manifold Transform by Recurrent Cortical Circuit Enhances Robust Encoding of Familiar Stimuli	Weifan Wang et.al.	2408.10873	null
2024-08-19	SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models	Anke Tang et.al.	2408.10174	link
2024-08-19	Customizing Language Models with Instance-wise LoRA for Sequential Recommendation	Xiaoyu Kong et.al.	2408.10159	link
2024-08-19	In-Context Learning with Representations: Contextual Generalization of Trained Transformers	Tong Yang et.al.	2408.10147	null
2024-08-19	Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models	Tianyu Zhang et.al.	2408.10124	link
2024-08-19	FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant	Zhengchao Huang et.al.	2408.10072	link
2024-08-19	Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development	Yuncheng Jiang et.al.	2408.10067	null
2024-08-19	Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory	Haoran Li et.al.	2408.10053	null
2024-08-19	Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype	Yadong Lu et.al.	2408.09984	null
2024-08-20	Application of Large Language Models in Automated Question Generation: A Case Study on ChatGLM’s Structured Questions for National Teacher Certification Exams	Ling He et.al.	2408.09982	null
2024-08-19	Contextual Importance and Utility in Python: New Functionality and Insights with the py-ciu Package	Kary Främling et.al.	2408.09957	link
2024-08-19	PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars	Sumanth Prabhu et.al.	2408.08869	null
2024-08-16	Visual Agents as Fast and Slow Thinkers	Guangyan Sun et.al.	2408.08862	link
2024-08-16	Revisiting the propagation of highly-energetic gamma rays in the Galaxy	Gaetano Di Marco et.al.	2408.08818	null
2024-08-16	CIKMar: A Dual-Encoder Approach to Prompt-Based Reranking in Educational Dialogue Systems	Joanito Agili Lopo et.al.	2408.08805	null
2024-08-16	Leveraging FourierKAN Classification Head for Pre-Trained Transformer-based Text Classification	Abdullah Al Imran et.al.	2408.08803	null
2024-08-16	Neighbor Overlay-Induced Graph Attention Network	Tiqiao Wei et.al.	2408.08788	null
2024-08-16	Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions	Bhuvanashree Murugadoss et.al.	2408.08781	null
2024-08-16	Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions	Chenming Tang et.al.	2408.08780	null
2024-08-16	Watching the Generative AI Hype Bubble Deflate	David Gray Widder et.al.	2408.08778	null
2024-08-16	Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused	Dingwei Chen et.al.	2408.08769	null
2024-08-15	SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training	Gengwei Zhang et.al.	2408.08295	link
2024-08-15	Heavy Labels Out! Dataset Distillation with Label Space Lightening	Ruonan Yu et.al.	2408.08201	null
2024-08-15	“I Try to Represent Myself as I Am”: Self-Presentation Preferences of People with Invisible Disabilities through Embodied Social VR Avatars	Ria J. Gualano et.al.	2408.08193	null
2024-08-16	Beyond Full Label: Single-Point Prompt for Infrared Small Target Label Generation	Shuai Yuan et.al.	2408.08191	link
2024-08-16	FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance	Jiasong Feng et.al.	2408.08189	null
2024-08-15	Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion	Adi Haviv et.al.	2408.08184	null
2024-08-15	EmBARDiment: an Embodied AI Agent for Productivity in XR	Riccardo Bovo et.al.	2408.08158	null
2024-08-15	P/D-Serve: Serving Disaggregated Large Language Model at Scale	Yibo Jin et.al.	2408.08147	null
2024-08-15	MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU	Yan Li et.al.	2408.08144	null
2024-08-15	Decoding Memes: A Comparative Study of Machine Learning Models for Template Identification	Levente Murgás et.al.	2408.08126	link
2024-08-14	Enhanced Detection of Conversational Mental Manipulation Through Advanced Prompting Techniques	Ivory Yang et.al.	2408.07676	null
2024-08-14	See It All: Contextualized Late Aggregation for 3D Dense Captioning	Minjung Kim et.al.	2408.07648	null
2024-08-14	Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach	Shizhou Zhang et.al.	2408.07500	link
2024-08-14	DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency	Xiaojing Zhong et.al.	2408.07481	null
2024-08-14	Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification	Yongcheng Li et.al.	2408.07467	link
2024-08-14	Large Language Models Prompting With Episodic Memory	Dai Do et.al.	2408.07465	null
2024-08-15	BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning	Asif Hanif et.al.	2408.07440	link
2024-08-14	Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with the Gemmini Accelerator	Federico Nicolas Peccia et.al.	2408.07404	null
2024-08-14	A Quantum-Inspired Analysis of Human Disambiguation Processes	Daphne Wang et.al.	2408.07402	null
2024-08-14	Segment Using Just One Example	Pratik Vora et.al.	2408.07393	null
2024-08-13	Categorical Framework for Typed Extensional and Intensional Models in Formal Semantics	Daniel Quigley et.al.	2408.07058	null
2024-08-13	TableGuard – Securing Structured & Unstructured Data	Anantha Sharma et.al.	2408.07045	null
2024-08-13	Imagen 3	Imagen-Team-Google et.al.	2408.07009	null
2024-08-13	Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models	Chun Jie Chong et.al.	2408.07004	null
2024-08-13	Prompt-Based Segmentation at Multiple Resolutions and Lighting Conditions using Segment Anything Model 2	Osher Rafaeli et.al.	2408.06970	null
2024-08-13	Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas	Louis Kwok et.al.	2408.06929	link
2024-08-13	SceneGPT: A Language Model for 3D Scene Understanding	Shivam Chandhok et.al.	2408.06926	null
2024-08-13	New refinements of Narayana polynomials and Motzkin polynomials	Janet J. W. Dong et.al.	2408.06912	null
2024-08-13	Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives	Zhihu Wang et.al.	2408.06904	null
2024-08-13	Entendre, a Social Bot Detection Tool for Niche, Fringe, and Extreme Social Media	Pranav Venkatesh et.al.	2408.06900	null
2024-08-12	Is it a work or leisure travel? Applying text classification to identify work-related travel on social networks	Lucas Félix et.al.	2408.06341	null
2024-08-12	LOLgorithm: Integrating Semantic,Syntactic and Contextual Elements for Humor Classification	Tanisha Khurana et.al.	2408.06335	null
2024-08-12	Animate, or Inanimate, That is the Question for Large Language Models	Leonardo Ranaldi et.al.	2408.06332	null
2024-08-12	Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let’s Take TravelPlanner as an Example	Yanan Chen et.al.	2408.06318	null
2024-08-12	From SAM to SAM 2: Exploring Improvements in Meta’s Segment Anything Model	Athulya Sundaresan Geetha et.al.	2408.06305	null
2024-08-12	Long-Form Answers to Visual Questions from Blind and Low Vision People	Mina Huh et.al.	2408.06303	null
2024-08-12	Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM	Trisha Das et.al.	2408.06285	null
2024-08-12	Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning	Yingjin Song et.al.	2408.06259	null
2024-08-12	Correlation Weighted Prototype-based Self-Supervised One-Shot Segmentation of Medical Images	Siladittya Manna et.al.	2408.06235	null
2024-08-12	Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting	Halley Young et.al.	2408.06186	null
2024-08-09	Multi-Garment Customized Model Generation	Yichen Liu et.al.	2408.05206	null
2024-08-09	Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners	Michael Vaccaro Jr et.al.	2408.05204	null
2024-08-09	TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning	Yujie Feng et.al.	2408.05200	link
2024-08-09	ECG-FM: An Open Electrocardiogram Foundation Model	Kaden McKeen et.al.	2408.05178	link
2024-08-09	AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset	Pritam Deka et.al.	2408.05149	null
2024-08-09	How Well Do LLMs Identify Cultural Unity in Diversity?	Jialin Li et.al.	2408.05102	link
2024-08-09	Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts	Tingchen Fu et.al.	2408.05094	null
2024-08-09	Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models	Zikai Xie et.al.	2408.05093	link
2024-08-09	Generating novel experimental hypotheses from language models: A case study on cross-dative generalization	Kanishka Misra et.al.	2408.05086	link
2024-08-09	SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation	Da Mu et.al.	2408.05057	null
2024-08-08	SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation	Jieming Yu et.al.	2408.04593	null
2024-08-08	SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals	Haoran Zheng et.al.	2408.04575	null
2024-08-08	Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User’s Casual Sketches	Yongzhi Xu et.al.	2408.04567	null
2024-08-08	Conversational Prompt Engineering	Liat Ein-Dor et.al.	2408.04560	null
2024-08-08	Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models	Yupeng Chang et.al.	2408.04556	link
2024-08-08	Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models	Fabio Pernisi et.al.	2408.04522	null
2024-08-08	Model-Based Transfer Learning for Contextual Reinforcement Learning	Jung-Hoon Cho et.al.	2408.04498	link
2024-08-08	What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant	Jonan Richards et.al.	2408.04477	null
2024-08-09	Achieving Robust Data-driven Contextual Decision Making in a Data Augmentation Way	Zhaoen Li et.al.	2408.04469	null
2024-08-08	Modelling Probabilistic FPC in Guarded Type Theory	Philipp Jan Andries Stassen et.al.	2408.04455	null
2024-08-07	SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature	Vinícius Di Oliveira et.al.	2408.03936	null
2024-08-07	FMiFood: Multi-modal Contrastive Learning for Food Image Classification	Xinyue Pan et.al.	2408.03922	null
2024-08-07	CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases	Xiangyan Liu et.al.	2408.03910	link
2024-08-07	Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models	Shachi H Kumar et.al.	2408.03907	null
2024-08-07	Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond	Beomseok Lee et.al.	2408.03900	link
2024-08-07	BeeManc at the PLABA Track of TAC-2023: Investigating LLMs and Controllable Attributes for Improving Biomedical Text Readability	Zihao Li et.al.	2408.03871	link
2024-08-07	GAIA – A Large Language Model for Advanced Power Dispatch	Yuheng Cheng et.al.	2408.03847	null
2024-08-07	WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models	Prannaya Gupta et.al.	2408.03837	link
2024-08-07	Target Prompting for Information Extraction with Vision Language Model	Dipankar Medhi et.al.	2408.03834	null
2024-08-07	Generative Language Models with Retrieval Augmented Generation for Automated Short Answer Scoring	Zifan Wang et.al.	2408.03811	null
2024-08-06	Training LLMs to Recognize Hedges in Spontaneous Narratives	Amie J. Paige et.al.	2408.03319	link
2024-08-06	Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters	Charlie Snell et.al.	2408.03314	null
2024-08-06	MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation	Xiaofeng Mao et.al.	2408.03312	null
2024-08-06	A search for soft X-ray emission lines in the afterglow spectrum of GRB 221009A	Sergio Campana et.al.	2408.03306	null
2024-08-06	SARA: Singular-Value Based Adaptive Low-Rank Adaption	Jihao Gu et.al.	2408.03290	null
2024-08-06	Biomedical SAM 2: Segment Anything in Biomedical Images and Videos	Zhiling Yan et.al.	2408.03286	link
2024-08-06	Synthesizing Text-to-SQL Data from Weak and Strong LLMs	Jiaxi Yang et.al.	2408.03256	null
2024-08-06	Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons	Yifei Wang et.al.	2408.03247	link
2024-08-06	Making Long-Context Language Models Better Multi-Hop Reasoners	Yanyang Li et.al.	2408.03246	link
2024-08-07	Red Type-1 Quasars after Cosmic Noon and Impact on $L_{\rm UV}$ -related Quasar Statistics	Yongjung Kim et.al.	2408.03228	null
2024-08-05	Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?	Mohammad Bahrami Karkevandi et.al.	2408.02651	null
2024-08-05	SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models	Muxi Diao et.al.	2408.02632	null
2024-08-05	Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection	Sajal Aggarwal et.al.	2408.02595	null
2024-08-05	The Role of Functional Muscle Networks in Improving Hand Gesture Perception for Human-Machine Interfaces	Costanza Armanini et.al.	2408.02547	null
2024-08-05	Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph	Zhao Kaichen et.al.	2408.02535	null
2024-08-05	Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation	Aaron Imani et.al.	2408.02502	link
2024-08-05	Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection	Ting Lei et.al.	2408.02484	link
2024-08-05	TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments	Daeun Song et.al.	2408.02454	null
2024-08-05	FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification	Yijin Huang et.al.	2408.02426	link
2024-08-05	Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models	Zi Liang et.al.	2408.02416	link
2024-08-02	Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting	Xiangyu Zhao et.al.	2408.01423	null
2024-08-02	Mission Impossible: A Statistical Perspective on Jailbreaking LLMs	Jingtong Su et.al.	2408.01420	null
2024-08-02	Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs	Yilun Hua et.al.	2408.01417	null
2024-08-02	Conditional LoRA Parameter Generation	Xiaolong Jin et.al.	2408.01415	null
2024-08-02	Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer	Yu Yang et.al.	2408.01402	null
2024-08-02	Transformers are Universal In-context Learners	Takashi Furuya et.al.	2408.01367	null
2024-08-02	MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code	Kaiwen Ning et.al.	2408.01354	link
2024-08-02	Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks	Anders Giovanni Møller et.al.	2408.01346	null
2024-08-02	Synergistic pathways of modulation enable robust task packing within neural dynamics	Giacomo Vedovati et.al.	2408.01316	null
2024-08-02	TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling	Dong Huo et.al.	2408.01291	null
2024-08-01	Segment anything model 2: an application to 2D and 3D medical images	Haoyu Dong et.al.	2408.00756	link
2024-08-01	Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model	Benlin Liu et.al.	2408.00754	null
2024-08-01	Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions	Guangzhi Xiong et.al.	2408.00727	link
2024-08-01	Point-supervised Brain Tumor Segmentation with Box-prompted MedSAM	Xiaofeng Liu et.al.	2408.00706	null
2024-08-01	Can Developers Prompt? A Controlled Experiment for Code Documentation Generation	Hans-Alexander Kruse et.al.	2408.00686	null
2024-08-01	Quantum Order by Disorder: A Key to Understanding the Magnetic Phases of BaCo $_2$(AsO$_4$)$_2$	Sangyun Lee et.al.	2408.00622	null
2024-08-01	Mitigating Multilingual Hallucination in Large Vision-Language Models	Xiaoye Qu et.al.	2408.00550	link
2024-08-01	Illustrating Classic Brazilian Books using a Text-To-Image Diffusion Model	Felipe Mahlow et.al.	2408.00544	null
2024-08-01	Jailbreaking Text-to-Image Models with LLM-Based Agents	Yingkai Dong et.al.	2408.00523	null
2024-08-01	A new approach for encoding code and assisting code understanding	Mengdan Fan et.al.	2408.00521	null
2024-07-31	Vision-Language Model Based Handwriting Verification	Mihir Chauhan et.al.	2407.21788	null
2024-07-31	Tulip Agent – Enabling LLM-Based Agents to Solve Tasks Using Large Tool Libraries	Felix Ocker et.al.	2407.21778	null
2024-07-31	Ge-based Clinopyroxene series: first principles and experimental local probe study	Ricardo P. Moreira et.al.	2407.21749	null
2024-07-31	A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation	Mothilal Asokan et.al.	2407.21739	null
2024-07-31	Detecting, Explaining, and Mitigating Memorization in Diffusion Models	Yuxin Wen et.al.	2407.21720	link
2024-07-31	Hyper-parameter tuning for text guided image editing	Shiwen Zhang et.al.	2407.21703	link
2024-07-31	Four-loop two-mass tadpoles and the $ρ$ parameter	Samuel Abreu et.al.	2407.21700	null
2024-07-31	Kramers-Kronig relations via Laplace formalism and $L^1$ integrability	Marco Prevedelli et.al.	2407.21694	null
2024-07-31	MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment	Anurag Das et.al.	2407.21654	null
2024-07-31	MSA2Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation	Sina Ghorbani Kolahi et.al.	2407.21640	link
2024-07-30	Add-SD: Rational Generation without Manual Reference	Lingfeng Yang et.al.	2407.21016	link
2024-07-30	CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning	Yuexi Du et.al.	2407.21011	link
2024-07-30	AI-Assisted Generation of Difficult Math Questions	Vedant Shah et.al.	2407.21009	link
2024-07-30	Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection	Jinfa Huang et.al.	2407.21004	link
2024-07-30	From Feature Importance to Natural Language Explanations Using LLMs with RAG	Sule Tekkesinoglu et.al.	2407.20990	link
2024-07-30	UniProcessor: A Text-induced Unified Low-level Image Processor	Huiyu Duan et.al.	2407.20928	link
2024-07-30	SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition	Hao Tan et.al.	2407.20920	null
2024-07-30	Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation	Pujan Paudel et.al.	2407.20910	null
2024-07-30	ThinkRepair: Self-Directed Automated Program Repair	Xin Yin et.al.	2407.20898	link
2024-07-30	Learn by Selling: Equipping Large Language Models with Product Knowledge for Context-Driven Recommendations	Sarthak Anand et.al.	2407.20856	null
2024-07-29	QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval	Hongming Tan et.al.	2407.20207	null
2024-07-29	Deciphering the Instability of the Black Hole Ringdown Quasinormal Spectrum	A. Ianniccari et.al.	2407.20144	null
2024-07-29	Context-Aware CSI Tracking and Path Loss Prediction Using Machine Learning and Dynamical Systems	Anis Hamadouche et.al.	2407.20123	null
2024-07-29	Generative Diffusion Model Bootstraps Zero-shot Classification of Fetal Ultrasound Images In Underrepresented African Populations	Fangyijie Wang et.al.	2407.20072	link
2024-07-29	Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models	Zhe Li et.al.	2407.20053	null
2024-07-29	Reproducibility Study of “ITI-GEN: Inclusive Text-to-Image Generation”	Daniel Gallo Fernández et.al.	2407.19996	link
2024-07-29	A Study on the Implementation Method of an Agent-Based Advanced RAG System Using Graph	Cheonsu Jeong et.al.	2407.19994	null
2024-07-29	MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion	Chencan Fu et.al.	2407.19976	null
2024-07-29	FedDEO: Description-Enhanced One-Shot Federated Learning with Diffusion Models	Mingzhao Yang et.al.	2407.19953	null
2024-07-29	FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention	Yu Lu et.al.	2407.19918	null
2024-07-26	Small Molecule Optimization with Large Language Models	Philipp Guevorguian et.al.	2407.18897	link
2024-07-26	The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs	Aleix Sant et.al.	2407.18786	null
2024-07-26	TESSILATOR: a one-stop shop for measuring TESS rotation periods	A. S. Binks et.al.	2407.18761	link
2024-07-29	Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery	Yuni Susanti et.al.	2407.18752	link
2024-07-26	Towards Generalized Offensive Language Identification	Alphaeus Dmonte et.al.	2407.18738	null
2024-07-26	Neurosymbolic AI for Enhancing Instructability in Generative AI	Amit Sheth et.al.	2407.18722	null
2024-07-26	Probing exotic long-lived particles from the prompt side using the CONTUR method	Louie Corpe et.al.	2407.18710	null
2024-07-26	Dilated Strip Attention Network for Image Restoration	Fangwei Hao et.al.	2407.18613	null
2024-07-26	Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation	Chaoyi Ai et.al.	2407.18562	null
2024-07-26	A Universal Prompting Strategy for Extracting Process Model Information from Natural Language Text using Large Language Models	Julian Neuberger et.al.	2407.18540	link
2024-07-25	LoRA-Pro: Are Low-Rank Adapters Properly Optimized?	Zhengbo Wang et.al.	2407.18242	link
2024-07-26	Recursive Introspection: Teaching Language Model Agents How to Self-Improve	Yuxiao Qu et.al.	2407.18219	null
2024-07-26	Exploring Scaling Trends in LLM Robustness	Nikolaus Howe et.al.	2407.18213	link
2024-07-26	Enhanced Depth Estimation and 3D Geometry Reconstruction using Bayesian Helmholtz Stereopsis with Belief Propagation	Razieh Azizi et.al.	2407.18195	null
2024-07-25	Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning	Sindhura Kommu et.al.	2407.18181	null
2024-07-25	Efficient Inference of Vision Instruction-Following Models with Elastic Cache	Zuyan Liu et.al.	2407.18121	link
2024-07-25	Keypoint Promptable Re-Identification	Vladimir Somers et.al.	2407.18112	link
2024-07-25	DINOv2 Rocks Geological Image Analysis: Classification, Segmentation, and Interpretability	Florent Brondolo et.al.	2407.18100	link
2024-07-25	C2P: Featuring Large Language Models with Causal Reasoning	Abdolmahdi Bagheri et.al.	2407.18069	null
2024-07-25	I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition	Yannis Vasilakis et.al.	2407.18058	link
2024-07-24	WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries	Wenting Zhao et.al.	2407.17468	null
2024-07-24	Fluent Student-Teacher Redteaming	T. Ben Thompson et.al.	2407.17447	link
2024-07-24	Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?	Michael-Andrei Panaitescu-Liess et.al.	2407.17417	null
2024-07-24	(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork	Tianjin Huang et.al.	2407.17412	null
2024-07-24	PERSONA: A Reproducible Testbed for Pluralistic Alignment	Louis Castricato et.al.	2407.17387	null
2024-07-24	ViPer: Visual Personalization of Generative Models via Individual Preference Learning	Sogand Salehi et.al.	2407.17365	null
2024-07-24	DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation	Qian Feng et.al.	2407.17348	null
2024-07-24	How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations?	Leo Yu-Ho Lo et.al.	2407.17291	null
2024-07-24	A Novel Two-Step Fine-Tuning Pipeline for Cold-Start Active Learning in Text Classification Tasks	Fabiano Belém et.al.	2407.17284	null
2024-07-25	LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model	Wanggong Yang et.al.	2407.17229	null
2024-07-23	Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions	Fabio Tosi et.al.	2407.16698	link
2024-07-23	Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack	Xiaoyue Xu et.al.	2407.16695	link
2024-07-23	Can Large Language Models Automatically Jailbreak GPT-4V?	Yuanwei Wu et.al.	2407.16686	null
2024-07-23	SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation	Pengfei Chen et.al.	2407.16682	null
2024-07-23	RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent	Huiyu Xu et.al.	2407.16667	null
2024-07-23	Lawma: The Power of Specialization for Legal Tasks	Ricardo Dominguez-Olmedo et.al.	2407.16615	null
2024-07-23	Shared Imagination: LLMs Hallucinate Alike	Yilun Zhou et.al.	2407.16604	null
2024-07-23	Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs	Yifan Xia et.al.	2407.16576	null
2024-07-24	Audio Prompt Adapter: Unleashing Music Editing Abilities for Text-to-Music with Lightweight Finetuning	Fang-Duo Tsai et.al.	2407.16564	link
2024-07-23	Patched RTC: evaluating LLMs for diverse software development tasks	Asankhaya Sharma et.al.	2407.16557	link
2024-07-22	AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description	Junyu Xie et.al.	2407.15850	link
2024-07-22	LLMmap: Fingerprinting For Large Language Models	Dario Pasquini et.al.	2407.15847	link
2024-07-22	HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning	Eugene Valassakis et.al.	2407.15844	null
2024-07-22	Artist: Aesthetically Controllable Text-Driven Stylization without Training	Ruixiang Jiang et.al.	2407.15842	link
2024-07-22	Inequalities in Computational Thinking Among Incoming Students in an STEM Chilean University	Felipe González-Pizarro et.al.	2407.15833	null
2024-07-23	Unveiling the Multifaceted GRB 200613A: Prompt Emission Dynamics, Afterglow Evolution, and the Host Galaxy’s Properties	Shao-Yu Fu et.al.	2407.15824	null
2024-07-22	Robust Facial Reactions Generation: An Emotion-Aware Framework with Modality Compensation	Guanyu Hu et.al.	2407.15798	null
2024-07-22	AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection	Yunkang Cao et.al.	2407.15795	link
2024-07-22	CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning	Emanuele Frascaroli et.al.	2407.15793	link
2024-07-22	Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach	Rian Dolphin et.al.	2407.15788	null
2024-07-19	T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation	Kaiyue Sun et.al.	2407.14505	link
2024-07-19	M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models	Seunggeun Chi et.al.	2407.14502	null
2024-07-19	Evaluating the Reliability of Self-Explanations in Large Language Models	Korbinian Randl et.al.	2407.14487	link
2024-07-19	ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities	Peng Xu et.al.	2407.14482	null
2024-07-19	Contrastive Learning with Counterfactual Explanations for Radiology Report Generation	Mingjie Li et.al.	2407.14474	null
2024-07-19	AttentNet: Fully Convolutional 3D Attention for Lung Nodule Detection	Majedaldein Almahasneh et.al.	2407.14464	null
2024-07-19	From Instruction to Insight: Exploring the Functional and Semantic Roles of Text in Interactive Dashboards	Nicole Sultanum et.al.	2407.14451	null
2024-07-19	Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model	Seonghui Min et.al.	2407.14434	null
2024-07-19	Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models	Hyun-Jic Oh et.al.	2407.14426	null
2024-07-19	Improving Retrieval in Sponsored Search by Leveraging Query Context Signals	Akash Kumar Mohankumar et.al.	2407.14346	null
2024-07-18	Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion	Boyang Deng et.al.	2407.13759	null
2024-07-18	LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation	David Schlangen et.al.	2407.13744	null
2024-07-18	HazeCLIP: Towards Language Guided Real-World Image Dehazing	Ruiyi Wang et.al.	2407.13719	link
2024-07-18	CoDefeater: Using LLMs To Find Defeaters in Assurance Cases	Usman Gohar et.al.	2407.13717	link
2024-07-18	Dynamic Pricing in Securities Lending Market: Application in Revenue Optimization for an Agent Lender Portfolio	Jing Xu et.al.	2407.13687	null
2024-07-18	EarthMarker: A Visual Prompt Learning Framework for Region-level and Point-level Remote Sensing Imagery Comprehension	Wei Zhang et.al.	2407.13596	link
2024-07-18	Robust Calibration of Large Vision-Language Adapters	Balamurali Murugesan et.al.	2407.13588	link
2024-07-18	SAM-Driven Weakly Supervised Nodule Segmentation with Uncertainty-Aware Cross Teaching	Xingyue Zhao et.al.	2407.13553	null
2024-07-18	GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding	Changshuo Wang et.al.	2407.13519	link
2024-07-19	Mask2Map: Vectorized HD Map Construction Using Bird’s Eye View Segmentation Masks	Sehwan Choi et.al.	2407.13517	link
2024-07-17	NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model	Zhongqun Zhang et.al.	2407.12727	null
2024-07-17	Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?	Ben Yao et.al.	2407.12725	null
2024-07-17	Promptable Counterfactual Diffusion Model for Unified Brain Tumor Segmentation and Generation with MRIs	Yiqing Shen et.al.	2407.12678	link
2024-07-17	FastSAM-3DSlicer: A 3D-Slicer Extension for 3D Volumetric Segment Anything Model with Uncertainty Quantification	Yiqing Shen et.al.	2407.12658	link
2024-07-17	Zero-shot Text-guided Infinite Image Synthesis with LLM guidance	Soyeong Kwon et.al.	2407.12642	null
2024-07-17	Rethinking the Architecture Design for Efficient Generic Event Boundary Detection	Ziwei Zheng et.al.	2407.12622	link
2024-07-17	Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models	Donggeun Kim et.al.	2407.12616	null
2024-07-17	AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism	William Brannon et.al.	2407.12613	link
2024-07-17	Continuous reasoning for adaptive container image distribution in the cloud-edge continuum	Damiano Azzolini et.al.	2407.12605	link
2024-07-17	VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding	Ofir Abramovich et.al.	2407.12594	link

USage Instructions

Usage instructions: here