SACFormer

In the realm of Reinforcement Learning (RL), Decision Transformer (DT) approaches have shown promise across various control tasks. However, a significant challenge lies in their predominant reliance on offline training, making direct online application of DT methods less feasible. Notably, online methods generally employ an actor-critic framework. In response to the difficulties encountered in online DT training, we introduce SACFomer, an innovative online RL framework that integrates sequence modeling. SACFomer uniquely incorporates a DT into the actor network within an actor-critic architecture. This approach has yielded competitive results in various OpenAI Gym environments, outperforming several baseline methods. An ablation study further elucidates the benefits of sequence modeling and investigates the impact of sequence length.