this post was submitted on 02 Jul 2025
5 points (100.0% liked)

Technology

38877 readers
504 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 6 years ago
MODERATORS
 

This paper introduces DiffuCoder, a 7B-scale open-source masked diffusion large language model (dLLM) specifically designed for code generation.

The research provides insights into how dLLMs generate content, distinguishing their decoding behavior from that of autoregressive (AR) models. Unlike AR models, dLLMs can intrinsically adjust their generation causality and increasing sampling temperature diversifies not just token choices but also their generation order, creating a rich search space for reinforcement learning (RL).

This flexibility allows dLLMs to be more non-autoregressive and generate tokens in a less sequential, more "human-like" code writing manner.

To leverage this diversity and improve performance, the paper proposes coupled-GRPO RL algorithm. This method utilizes a coupled-sampling scheme that constructs complementary mask noise during training to reduce the variance of token log-likelihood estimates while maintaining training efficiency.

Experimentally, coupled-GRPO significantly boosts DiffuCoder's performance on code generation benchmarks, notably improving EvalPlus scores by 4.4% with training on only 21K samples. The research also shows that coupled-GRPO trained models experience a smaller performance drop when decoding steps are halved (resulting in a 2x speedup), indicating increased parallelism and reduced reliance on AR bias during decoding.

available at https://huggingface.co/apple/DiffuCoder-7B-cpGRPO

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here