Tjong:A transformer‐based Mahjong AI via hierarchical decision‐making and fan backward

下载PDF

导出

摘要 Mahjong,a complex game with hidden information and sparse rewards,poses significant challenges.Existing Mahjong AIs require substantial hardware resources and extensive datasets to enhance AI capabilities.The authors propose a transformer‐based Mahjong AI(Tjong)via hierarchical decision‐making.By utilising self‐attention mechanisms,Tjong effectively captures tile patterns and game dynamics,and it decouples the decision pro-cess into two distinct stages:action decision and tile decision.This design reduces de-cision complexity considerably.Additionally,a fan backward technique is proposed to address the sparse rewards by allocating reversed rewards for actions based on winning hands.Tjong consists of 15M parameters and is trained using approximately 0.5 M data over 7 days of supervised learning on a single server with 2 GPUs.The action decision achieved an accuracy of 94.63%,while the claim decision attained 98.55%and the discard decision reached 81.51%.In a tournament format,Tjong outperformed AIs(CNN,MLP,RNN,ResNet,VIT),achieving scores up to 230%higher than its opponents.Further-more,after 3 days of reinforcement learning training,it ranked within the top 1%on the leaderboard on the Botzone platform.

作者 Xiali Li Bo Liu Zhi Wei Zhaoqi Wang Licheng Wu

机构地区 School of Information and Engineering Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE Department of Computer Science

出处《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第4期982-995,共14页 智能技术学报（英文）

基金 National Natural Science Foundation of China,Grant/Award Numbers:62276285,62236011 Major Project of National Social Sciences Foundation of China,Grant/Award Number:20&ZD279。

关键词 decision making deep learning deep neural networks

分类号 TM41 [电气工程—电器]

引文网络
相关文献

参考文献2

1刘溜,张小川,彭丽蓉,田震,万家强,任越.一种结合策略价值网络的五子棋自博弈方法研究[J].重庆理工大学学报（自然科学）,2022,36(12):129-135. 被引量：3
2刘贺,张小川,刁志东,王森.一种棋类计算机博弈强化学习智能体的决策依据解释方法[J].重庆理工大学学报（自然科学）,2021,35(12):140-146. 被引量：2

二级参考文献8

1李大舟,沈雪雁,高巍,张小明,孟智慧.一种自学习的智能五子棋算法的设计与实现[J].小型微型计算机系统,2020,41(6):1169-1175. 被引量：3
2张小川,王宛宛,彭丽蓉.一种军棋机器博弈的多棋子协同博弈方法[J].智能系统学报,2020,15(2):399-404. 被引量：5
3李枫,王彦博.基于专家系统与DAN网络的围棋局面判断算法[J].北华大学学报（自然科学版）,2020,21(4):556-560. 被引量：1
4曹风云,赵卫华.基于Java的五子棋博弈平台研究[J].重庆工商大学学报（自然科学版）,2021,38(2):10-15. 被引量：1
5王鸿菲,王静文,李媛.基于PVS算法的六子棋博弈系统的研究[J].智能计算机与应用,2021,11(2):97-100. 被引量：2
6王亚杰,祁冰枝,张云博,丁傲冬.结合神经网络的改进UCT在国际跳棋中的应用[J].重庆理工大学学报（自然科学）,2021,35(7):259-265. 被引量：5
7张小川,刘溜,陈龙,涂飞.一种非遗藏族久棋项目计算机博弈智能体的评估方法[J].重庆理工大学学报（自然科学）,2021,35(12):119-126. 被引量：4
8刘贺,张小川,刁志东,王森.一种棋类计算机博弈强化学习智能体的决策依据解释方法[J].重庆理工大学学报（自然科学）,2021,35(12):140-146. 被引量：2

共引文献3

1刘溜,张小川,彭丽蓉,田震,万家强,任越.一种结合策略价值网络的五子棋自博弈方法研究[J].重庆理工大学学报（自然科学）,2022,36(12):129-135. 被引量：3
2徐长明,周其磊,王一川,王栋年,金张根,王军伟.维护全局博弈图的蒙特卡洛图搜索[J].重庆理工大学学报（自然科学）,2024,38(5):130-136.
3王栋年,王军伟,薛世超,汪超,徐长明.基于深度强化学习的双置换表优化算法研究[J].重庆理工大学学报（自然科学）,2024,38(5):145-153.

1Xinfang Lv,Xue Wu,Kai Liu,Xinke Zhao,Chenliang Pan,Jing Zhao,Juan Chang,Huan Guo,Xiang Gao,Xiaodong Zhi,Chunzhen Ren,Qilin Chen,Hugang Jiang,Chunling Wang,Ying‐Dong Li.Development and validation of a nomogram to predict cardiac death after radiotherapy for esophageal cancer[J].Cancer Innovation,2023,2(5):391-404.
2Jun Sun,Yinglin Xia.Pretreating and normalizing metabolomics data for statistical analysis[J].Genes & Diseases,2024,11(3):188-205. 被引量：1
3Yonghong Zhang,Ning Hu,Zhuofu Li,Xuquan Ji,Shanshan Liu,Youyang Sha,Xiongkang Song,Jian Zhang,Lei Hu,Weishi Li.Lumbar spine localisation method based on feature fusion[J].CAAI Transactions on Intelligence Technology,2023,8(3):931-945.
4Yu‐Dong Zhang,Juan Manuel Górriz.Guest Editorial:Knowledge‐based deep learning system in bio‐medicine[J].CAAI Transactions on Intelligence Technology,2024,9(4):787-789.
5Mujeeb Ur Rehman,Arslan Shafique,Muhammad Shahbaz Khan,Maha Driss,Wadii Boulila,Yazeed Yasin Ghadi,Suresh Babu Changalasetty,Majed Alhaisoni,Jawad Ahmad.A novel medical image data protection scheme for smart healthcare system[J].CAAI Transactions on Intelligence Technology,2024,9(4):821-836.
6Qian-Qian Liu,Jin-Qiu Xia,Jie Wu,Yi Han,Gui-Quan Zhang,Ping-Xia Zhao,Cheng-Bin Xiang.Root-derived long-distance signals trigger ABA synthesis and enhance drought resistance in Arabidopsis[J].Journal of Genetics and Genomics,2024,51(7):749-761.
7Huai-Yong Guan,Jin Wang,Ji-Xue Wang,Qi-Hui Chen,Ji Lu,Liang He.Renal pelvis sarcomatoid carcinoma with renal vein tumor thrombus:A case report and literature review[J].World Journal of Clinical Cases,2023,11(31):7690-7698.
8Yi Herng Chan,Zhe Phak Chan,Serene Sow Mun Lock,Chung Loong Yiin,Shin Ying Foong,Mee Kee Wong,Muhammad Anwar Ishak,Ven Chian Quek,Shengbo Ge,Su Shiung Lam.Thermal pyrolysis conversion of methane to hydrogen(H_(2)):A review on process parameters,reaction kinetics and techno-economic analysis[J].Chinese Chemical Letters,2024,35(8):62-73.
9Fiona C.Bull,Paul J.Simpson.A marathon,not a sprint:Increasing population physical activity as a legacy of sports mega-events[J].Journal of Sport and Health Science,2024,13(6):732-735.
10Li Ying,Duoqian Miao,Zhifei Zhang,Hongyun Zhang,Witold Pedrycz.Multi-granularity feature enhancement network for maritime ship detection[J].CAAI Transactions on Intelligence Technology,2024,9(3):649-664.

CAAI Transactions on Intelligence Technology

2024年第4期

浏览历史

内容加载中请稍等...

Tjong:A transformer‐based Mahjong AI via hierarchical decision‐making and fan backward

参考文献2

二级参考文献8

共引文献3

相关作者

相关机构

相关主题

浏览历史