I made a TouHou simulation game, and recently I add few Algorithms and a simple Q-learning AI to play the game automaticly.
我给我之前做的弹幕设计游戏增设了AI+算法的避弹系统,中文版在后面。
Click here to see DEMO | 点击这里显示DEMO
WHY NOT PURE REINFORCEMENT LEARNING:
Obviously not simple Q-learning, for too large state space.
DUELING DQN ATTEMPT:
So I first tried to use Dueling DQN (Separates state-value and advantage streams, enabling precise action evaluation – ideal for bullet-hell games like Touhou with dense decision points.) to play the game, input surrounding bullets locations and enemy location, agent locations… however, the model was just too big and it would spend a really long time to train…
ALGORITHM DESIGN:
In order to minimize the state space, the model only considers some key information:
- surrounding bullets threat
- enemy location & health
- agent location (we don’t consider Touhou one-hit-death, too difficult.)
Originally, I designed few different bullets, some move directly forward, some move within a complex function. But we cannot directly tell the AI how the bullets will move, since that would be unfair, thus we consider all bullets move directly (in gameplay, they still move in their own function).
I split 9 areas surrounding the agent (up left, up, up right, right… and its own location). Then the model will calculate the threats to each area: in a certain range of area (ex. 500px), we consider the bullets that may hit those 9 areas, calculate the speed and distance, use the possible hit time as weight. Model will choose the area with minimal threats.
For enemy locations, agent needs to focus on the enemy with biggest threats, and keep on it. So when safe or there are multiple choices, it goes to the direction with enemy with biggest health and keeps till it dies.
It is dangerous to be stuck on the edge, thus we need the agent to go back to the middle, 70% down to the bottom.
We set a Q-learning to make decisions when there are multiple choices. Add weight to each consideration, when the weight of different choices is close, Q-learning makes the final decision…
But, Not enough!
PROBLEMS & SOLUTIONS:
During the test, I found that when there are too many bullets, the agent would get stuck in a corner. Since it always chooses the direction with fewer bullets, it will keep stuck in the corner (because the direction of the area outside the corner has no bullets). To deal with this problem, we need to set the corner and edges with really high threats, increasing with distance^x. Closer means more danger.
Another problem is the agent may not choose the best direction, even when few pixels away there are no bullets, thus we need to have a large care area. In algorithm, we need an additional “care area”, care for the number of bullets, agent will first try to avoid the bullets, and then choose the direction with fewer bullets.
为什么不用纯强化学习:
显然简单的Q-learning不适用,因为状态空间太大。
Dueling DQN尝试:
最初尝试使用Dueling DQN(分离状态值和优势流,实现精确动作评估,特别适合东方这类弹幕密集的游戏),输入包括周围子弹位置、敌人位置、自机位置等。但模型太大,训练时间过长…
算法设计:
为最小化状态空间,模型只考虑关键信息:
- 周围子弹威胁
- 敌人位置和血量
- 自机位置(不考虑一击必杀机制,太难)
原本设计了多种子弹轨迹,有直线运动的,也有复杂函数运动的。但不能直接告诉AI子弹运动规律,这样不公平,所以计算时都视为直线运动(实际游戏中仍按原轨迹)。
将自机周围划分为9个区域(左上、上、右上、右…以及中心位置)。模型会计算每个区域的威胁值:在特定范围内(如500像素),考虑可能击中这9个区域的子弹,根据速度和距离计算可能命中时间作为权重。选择威胁最小的区域。
对于敌人位置,智能体会优先锁定威胁最大的敌人(通常是血量最多的),并持续攻击直到其被击破。
边界处理:
被卡在边缘很危险,所以需要让智能体保持在中间区域(距离底部70%的位置)。
Q-learning决策:
当多个选择权重相近时,使用Q-learning模块做最终决定…
但是,还不够!
问题与解决方案:
测试中发现,子弹过多时智能体会卡在角落。因为它总是选择子弹最少的方向,而角落外的区域没有子弹,导致一直卡住。解决方法是对边缘和角落设置极高的威胁值,并按距离^x递增,越近越危险。
另一个问题是智能体有时不会选择最优路径,即使几像素外就是安全区域。解决方法是增设一个大的九宫格,判断不同方向的弹幕数量,在算法中先确保避开子弹,再选择子弹较少的方向。