Accepted to ICML 2026

Learning to Label: A Reinforced Self-Evolving Framework for Semi-supervised Referring Expression Segmentation

A reinforced self-evolving framework for reliable pseudo-label construction in semi-supervised referring expression segmentation.

Runlong Cao¹ Ying Zang^2,* Chuanwei Zhou^3,4 Tianrun Chen⁵ Tong Zhang¹ Zhen Cui⁶ Chunyan Xu^1,*

¹ School of Computer Science and Engineering, Nanjing University of Science and Technology
² School of Information Engineering, Huzhou Normal University
³ School of Artificial Intelligence, Nanjing University of Posts and Telecommunications
⁴ National Key Laboratory of Tibetan Language Intelligence
⁵ Zhejiang University
⁶ Beijing Normal University
^* Corresponding authors

Paper Code BibTeX

Learning Reliable Labels from Sparse Supervision

L2L treats pseudo-label construction as a learnable decision-making process. It uses multimodal semantic-spatial priors, learnable guidance signals, and reinforced pseudo-label selection to progressively improve pixel-level supervision for referring expression segmentation.

Abstract

Semi-supervised referring expression segmentation (SS-RES) aims to achieve precise pixel-level language grounding under limited annotation, yet suffers from limited supervision and unreliable pseudo-labels when exploiting unlabeled image-text pairs. In this work, we propose Learning to Label, a reinforced self-evolving framework (L2L) that casts pseudo-label construction as a learnable decision-making process. To build foundational understanding, we leverage a multimodal large language model to extract semantic-spatial priors, which are instantiated as initial soft segmentation proposals and elevated, together with textual cues, into learnable guidance signals that condition a hierarchical segmentation network. To ensure stable learning, reinforced pseudo-label selection is formulated as an exploratory decision process that adaptively rewards high-utility pixel-level supervision based on multimodal priors and model predictions. This reinforced self-evolving loop enables joint optimization of the segmentation model and pseudo-labels, progressively enhancing label reliability under sparse supervision. Extensive experiments on RefCOCO, RefCOCO+, and RefCOCOg demonstrate improvements over existing methods, validating its effectiveness and generalization.

Learnable Labeling

Casts pseudo-label construction as a learnable decision-making process rather than a fixed heuristic.

Multimodal Priors

Uses an MLLM to extract semantic-spatial priors and convert them into guidance for hierarchical segmentation.

Self-Evolving Loop

Jointly optimizes the segmentation model and pseudo-labels to improve label reliability under sparse supervision.

3 Benchmarks

ICML 2026

Methodology

Overview

Results

Main Results

Table 1 main results

Analysis

RefCOCO

RefCOCO+

RefCOCOg

Resources

Paper

PDF of Learning to Label.

Code

Add the GitHub repository link when released.

Poster / Slides

Optional conference materials can live here.

Citation

If you find this work useful, please consider citing it.

BibTeX