schola.scripts.ray.settings.APPOSettings

类定义

class schola.scripts.ray.settings.APPOSettings(gae_lambda=0.95, clip_param=0.2, use_gae=True, vtrace=True, vtrace_clip_rho_threshold=1.0, vtrace_clip_pg_rho_threshold=1.0)

基类: IMPALASettings, PPOSettings

APPO (Asynchronous Proximal Policy Optimization) 算法特定设置的数据类。此类继承自 IMPALASettings 和 PPOSettings，以合并两种算法的设置。这允许在单个算法中使用 V-trace 进行离策略校正和 PPO 进行策略优化。

参数

gae_lambda

类型： float

clip_param

类型： float

use_gae

类型： bool

vtrace

类型： bool

vtrace_clip_rho_threshold

类型： float

vtrace_clip_pg_rho_threshold

类型： float

属性

clip_param

PPO 算法的 clip 参数。

gae_lambda

Generalized Advantage Estimation (GAE) 的 lambda 参数。

name

类型： str

rllib_config

类型: Type[APPOConfig]

use_gae

是否使用 Generalized Advantage Estimation (GAE) 来计算优势。

vtrace

是否在 IMPALA 算法中使用 V-trace 算法进行离策略校正。

vtrace_clip_pg_rho_threshold

策略梯度中 V-trace rho 值的裁剪阈值。

vtrace_clip_rho_threshold

V-trace rho 值的裁剪阈值。

方法

init

__init__(gae_lambda=0.95, clip_param=0.2, use_gae=True, vtrace=True, vtrace_clip_rho_threshold=1.0, vtrace_clip_pg_rho_threshold=1.0)

返回类型： None

get_parser

classmethod get_parser()

将设置添加到解析器或子解析器

get_settings_dict

get_settings_dict()

以 Ray 中正确的参数名称作为键，将设置获取为字典