
On Monday, Ulsan National Institute of Science & Technology (UNIST) research team, led by Professor Yoon Sung Whan from the Artificial Intelligence (AI) Graduate School, developed a groundbreaking reinforcement learning technique that maintains consistent performance even in changing environments.
The team has made significant strides in addressing a key challenge in AI development. Reinforcement learning, unlike supervised learning, mimics human learning processes more closely. It allows AI to develop problem-solving strategies, or policies, by maximizing rewards through trial and error.
However, traditional reinforcement learning methods often struggle when faced with unfamiliar environments, leading to sharp declines in performance.
To tackle this issue, the UNIST team proposed an innovative approach that reduces the sensitivity of cumulative rewards. Their strategy involves flattening the cumulative reward surface in the policy parameter space, preventing drastic changes in reward values due to minor behavioral adjustments.
This new method offers a stark contrast to existing approaches. For instance, in self-driving car scenarios, a slight misjudgment in braking timing on snowy roads could previously lead to significant performance drops. The new technique, however, maintains stable performance even with small policy changes.
In tests simulating real-world conditions with varying friction and weight, the novel learning technique demonstrated impressive resilience. It maintained an average reward retention rate of 80-90%, showcasing high levels of stability and robustness.
In comparison, traditional methods saw their average rewards plummet to less than half under identical conditions, highlighting their limitations.
Lead researcher Lee Hyun-kyu explained that they adapted the Sharpness-Aware Minimization (SAM) technique from supervised learning to achieve this breakthrough.
While supervised learning uses a loss function to measure model accuracy, SAM seeks flat minima to prevent sudden spikes in loss.
The team innovatively applied this concept to reinforcement learning, focusing on stabilizing cumulative rewards.
Professor Yoon envisions wide-ranging applications for this highly generalizable reinforcement learning model, particularly in fields like robotics and autonomous driving.
The research was supported by Institute for Information & communication Technology Planning & Evaluation, National Research Foundation of Korea, and UNIST.
The technique has been selected for oral presentation at International Conference on learning Representations (ICLR), one of the world’s top three AI conferences, placing it in the top 2% (207 papers) of over 11,672 submitted papers. The ICLR 2025 conference is scheduled for April 24-28 in Singapore.