- Schola 文档
- API 文档
- 概述
- Python
- Python API 文档
- core
- extensions
- generated
- scripts
- Unreal
- Unreal API 文档
- classes
- AAbstractScholaEnvironment
- AAbstractTrainer
- ABlueprintDynamicScholaEnvironment
- ABlueprintScholaEnvironment
- ABlueprintStaticScholaEnvironment
- ABlueprintTrainer
- ADynamicScholaEnvironment
- AInferenceController
- AInferencePawn
- AStaticScholaEnvironment
- CallData
- ConstPointVisitor
- ExchangeCallData
- ExchangeRPCBackend
- ExchangeRPCWorker
- FScholaModule
- IBinaryActuatorWrapper
- IBinaryObserverWrapper
- IBlueprintBinaryActuatorWrapper
- IBlueprintBinaryObserverWrapper
- IBlueprintBoxActuatorWrapper
- IBlueprintBoxObserverWrapper
- IBlueprintDiscreteActuatorWrapper
- IBlueprintDiscreteObserverWrapper
- IBoxActuatorWrapper
- IBoxObserverWrapper
- IBrainInterface
- IComBackendInterface
- IDiscreteActuatorWrapper
- IDiscreteObserverWrapper
- IExchangeBackendInterface
- IGymConnector
- IInferenceAgent
- IModelInstanceInterface
- IModelInterface
- IPollingBackendInterface
- IProducerBackendInterface
- IRuntimeInterface
- IValidatable
- PointSerializer
- PointVisitor
- PollingRPCBackend
- PollingRPCWorker
- ProducerRPCBackend
- ProducerRPCWorker
- ProtobufSerializer
- RPCBackend
- Singleton
- UAbstractBrain
- UAbstractEnvironmentUtilityComponent
- UAbstractGymConnector
- UAbstractInteractor
- UAbstractNormalizer
- UAbstractObserver
- UAbstractPolicy
- UActionClipper
- UActuator
- UActuatorComponent
- UAgentUIDSubsystem
- UAsyncBrain
- UBTTask_AgentStep
- UBinaryActuator
- UBinaryActuatorWrapper
- UBinaryObserver
- UBinaryObserverWrapper
- UBlueprintBinaryActuator
- UBlueprintBinaryActuatorWrapper
- UBlueprintBinaryObserver
- UBlueprintBinaryObserverWrapper
- UBlueprintBoxActuator
- UBlueprintBoxActuatorWrapper
- UBlueprintBoxObserver
- UBlueprintBoxObserverWrapper
- UBlueprintDiscreteActuator
- UBlueprintDiscreteActuatorWrapper
- UBlueprintDiscreteObserver
- UBlueprintDiscreteObserverWrapper
- UBlueprintEnvironmentUtilityComponent
- UBlueprintPolicy
- UBoxActuator
- UBoxActuatorWrapper
- UBoxObserver
- UBoxObserverWrapper
- UBrainInterface
- UCPUModelWrapper
- UCPURuntimeWrapper
- UCameraObserver
- UCommunicationManager
- UDebugBinaryActuator
- UDebugBinaryObserver
- UDebugBoxActuator
- UDebugBoxObserver
- UDebugDiscreteActuator
- UDebugDiscreteObserver
- UDiscreteActuator
- UDiscreteActuatorWrapper
- UDiscreteBrain
- UDiscreteObserver
- UDiscreteObserverWrapper
- UEventObserver
- UExternalGymConnector
- UFrameStacker
- UGPUModelWrapper
- UGPURuntimeWrapper
- UGymConnector
- UHardNormalizer
- UInferenceAgent
- UInferenceComponent
- UInferencePolicy
- UInteractionComponent
- UInteractionManager
- UModelInstanceWrapper
- UMovementInputActuator
- UObservationClipper
- UPositionObserver
- UPythonGymConnector
- URayCastObserver
- URotationActuator
- URotationObserver
- UScholaManagerSubsystem
- UScholaManagerSubsystemSettings
- USensor
- UStatLoggerComponent
- USynchronousBrain
- UTeleportActuator
- UValidatable
- UVelocityObserver
- structs
- FActTickFunction
- FAction
- FAgentId
- FBinaryPoint
- FBinarySpace
- FBoxPoint
- FBoxSpace
- FBoxSpaceDimension
- FCommunicatorSettings
- FCustomTrainingSettings
- FDictPoint
- FDictSpace
- FDiscretePoint
- FDiscreteSpace
- FDynamicAgentStruct
- FEnvReset
- FEnvStep
- FEnvUpdate
- FEnvironmentDefinition
- FGenericTensorBinding
- FInferencePolicyBuffer
- FInteractionDefinition
- FLaunchableScript
- FPoint
- FPolicyDecision
- FRLlibAPPOSettings
- FRLlibCheckpointSettings
- FRLlibIMPALASettings
- FRLlibLoggingSettings
- FRLlibNetworkArchSettings
- FRLlibPPOSettings
- FRLlibResourceSettings
- FRLlibResumeSettings
- FRLlibTrainingSettings
- FSB3CheckpointSettings
- FSB3LoggingSettings
- FSB3NetworkArchSettings
- FSB3PPOSettings
- FSB3ResumeSettings
- FSB3SACSettings
- FSB3TrainingSettings
- FScriptArgBuilder
- FScriptSettings
- FSharedEnvironmentDefinition
- FSharedEnvironmentState
- FSharedTrainingDefinition
- FSpace
- FStartRequest
- FThinkTickFunction
- FTrainerAgentPair
- FTrainerConfiguration
- FTrainerDefinition
- FTrainerState
- FTrainingDefinition
- FTrainingSettings
- FTrainingState
- FTrainingStateUpdate
- FValidationResult
schola.scripts.ray.settings.TrainingSettings
类定义
class schola.scripts.ray.settings.TrainingSettings(timesteps=3000, learning_rate=0.0003, minibatch_size=128, train_batch_size_per_learner=256, num_sgd_iter=5, gamma=0.99)基类: object
用于 RLlib 训练过程的通用训练设置的数据类。此类定义了训练的参数,包括时间步数、学习率、小批量大小以及控制训练过程的其他超参数。这些设置适用于任何 RLlib 算法,并且可以根据训练作业的具体要求进行自定义。
参数
时间步
Type: int
learning_rate
类型: float
minibatch_size
Type: int
train_batch_size_per_learner
Type: int
num_sgd_iter
Type: int
gamma
类型: float
属性
gamma
类型: float
Default: 0.99
强化学习算法的折扣因子。这用于计算未来奖励的现值。0.99 的值表示未来奖励每过去一个时间步将打折 1%。这有助于平衡训练过程中即时奖励和未来奖励的重要性。接近 1.0 的值将更侧重于未来奖励,而接近 0 的值将更侧重于即时奖励。
learning_rate
类型: float
Default: 0.0003
任何选定算法的学习率。这控制了在每次模型权重更新时,根据估计的误差调整模型权重的幅度。较小的值意味着较慢的学习,而较大的值意味着较快的学习。
minibatch_size
Type: int
默认值: 128
训练小批量的尺寸。这是在训练的每次迭代中使用来更新模型权重的样本数量。较大的批量大小可以导致更稳定的梯度估计,但需要更多内存,并且如果过大,可能会减慢训练速度。
name
类型: str
num_sgd_iter
Type: int
默认值: 5
每个批次的随机梯度下降 (SGD) 迭代次数。这是使用小批量中的样本更新模型权重的次数。更多迭代可以带来更好的收敛性,但也会增加训练时间。
时间步
Type: int
默认值: 3000
训练的时间步数。这是训练期间运行的总时间步数。
train_batch_size_per_learner
Type: int
Default: 256
在训练期间提供给每个学习器的样本数。必须能被 minibatch_size 整除。
方法
__init__
__init__(timesteps=3000, learning_rate=0.0003, minibatch_size=128, train_batch_size_per_learner=256, num_sgd_iter=5, gamma=0.99)返回类型: None
populate_arg_group
classmethod populate_arg_group(args_group)