Abstract: Safe exploration can be regarded as a constrained Markov decision problem (CMDP) where the expected long-term cost is constrained. Previous off-policy algorithms convert the constrained ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results