All Stories

  1. Minimizing the pretraining gap: Domain-aligned text-based person retrieval
  2. Event-based Lip Reading with Triplane Fusion Network
  3. Progressive Text-to-3D Generation for Automatic 3D Prototyping
  4. CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution
  5. Introduction to the Special Issue on Deep Multimodal Generation and Retrieval
  6. Domain-Agnostic Neural Oil Painting via Normalization Affine Test-Time Adaptation
  7. UniAD: Integrating Geometric and Semantic Cues for Unified Anomaly Detection
  8. Proceedings of the 3rd International Workshop on UAVs in Multimedia: Capturing the World from a New Perspective
  9. The 3rd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective
  10. MORE'25 Multimedia Object Re-ID: Advancements, Challenges, and Opportunities
  11. From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search
  12. CAMeL: Cross-Modality Adaptive Meta-Learning for Text-Based Person Retrieval
  13. EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis
  14. Scale Up Composed Image Retrieval Learning via Modification Text Generation
  15. Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene
  16. Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
  17. The 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective
  18. The 2nd International Workshop on Deep Multi-modal Generation and Retrieval
  19. Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation
  20. Depth-Aware Blind Image Decomposition for Real-World Adverse Weather Recovery
  21. Self-ensembling depth completion via density-aware consistency
  22. Collaborative group: Composed image retrieval via consensus learning from noisy annotations
  23. Multiple-environment Self-adaptive Network for aerial-view geo-localization
  24. MORE'24 Multimedia Object Re-ID: Advancements, Challenges, and Opportunities
  25. High Fidelity Makeup via 2D and 3D Identity Preservation Net
  26. StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition
  27. Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation
  28. Active Discovering New Slots for Task-Oriented Conversation
  29. Learning Cross-View Geo-Localization Embeddings via Dynamic Weighted Decorrelation Regularization
  30. UAVM '23: 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective
  31. Deep Multimodal Learning for Information Retrieval
  32. PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation
  33. Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark
  34. Learnable Pillar-based Re-ranking for Image-Text Retrieval
  35. Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning
  36. Context-Aware Pretraining for Efficient Blind Image Decomposition
  37. Multi-view Consistent Generative Adversarial Networks for Compositional 3D-Aware Image Synthesis
  38. Align and Tell: Boosting Text-Video Retrieval With Local Alignment and Fine-Grained Supervision
  39. Progressive Local Filter Pruning for Image Retrieval Acceleration
  40. U-Turn: Crafting Adversarial Queries with Opposite-Direction Features
  41. Soft Person Reidentification Network Pruning via Blockwise Adjacent Filter Decaying
  42. Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis
  43. Each Part Matters: Local Patterns Facilitate Cross-View Geo-Localization
  44. Adaptive Boosting for Domain Adaptation: Toward Robust Predictions in Scene Segmentation
  45. DMRNet++: Learning Discriminative Features with Decoupled Networks and Enriched Pairs for One-Step Person Search
  46. Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization
  47. Parameter-Efficient Person Re-Identification in the 3D Space
  48. SPG-VTON: Semantic Prediction Guidance for Multi-Pose Virtual Try-on
  49. Self-supervised Point Cloud Representation Learning via Separating Mixed Shapes
  50. Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search
  51. Rectifying Pseudo Label Learning via Uncertainty Estimation for Domain Adaptive Semantic Segmentation
  52. VehicleNet: Learning Robust Visual Representation for Vehicle Re-Identification
  53. University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization
  54. Real-World Automatic Makeup via Identity Preservation Makeup Net
  55. Unsupervised Scene Adaptation with Memory Regularization in vivo
  56. Dual-path Convolutional Image-Text Embeddings with Instance Loss
  57. Going Beyond Real Data: A Robust Visual Representation for Vehicle Re-identification
  58. Thorax disease classification with attention guided convolutional neural network
  59. Bayesian query expansion for multi-camera person re-identification
  60. Unsupervised Eyeglasses Removal in the Wild
  61. Improving person re-identification by attribute and identity learning
  62. Pedestrian Alignment Network for Large-scale Person Re-Identification
  63. Joint Discriminative and Generative Learning for Person Re-Identification
  64. CamStyle: A Novel Data Augmentation Method for Person Re-Identification
  65. Multi-Pseudo Regularized Label for Generated Data in Person Re-Identification
  66. Camera Style Adaptation for Person Re-identification
  67. An improved artificial intelligence scheme for person reidentification
  68. Macro-Micro Adversarial Network for Human Parsing
  69. Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro