【强化学习论文合集】NeurIPS-2021 强化学习论文
创始人
2024-02-21 06:30:11
0

强化学习(Reinforcement Learning, RL),又称再励学习、评价学习或增强学习,是机器学习的范式和方法论之一,用于描述和解决智能体(agent)在与环境的交互过程中通过学习策略以达成回报最大化或实现特定目标的问题。
本专栏整理了近几年国际顶级会议中,涉及强化学习(Reinforcement Learning, RL)领域的论文。顶级会议包括但不限于:ICML、AAAI、IJCAI、NIPS、ICLR、AAMAS、CVPR、ICRA等。

在这里插入图片描述

今天给大家分享的是2021年神经信息处理系统大会(Conference and Workshop on Neural Information Processing Systems)中涉及“强化学习”主题的论文。

NIPS(NeurIPS),全称神经信息处理系统大会(Conference and Workshop on Neural Information Processing Systems),是一个关于机器学习和计算神经科学的国际会议。该会议固定在每年的12月举行,由NIPS基金会主办。NIPS是机器学习领域的顶级会议。在中国计算机学会的国际学术会议排名中,NIPS为人工智能领域的A类会议。

  • [1]. Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning.
  • [2]. Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization.
  • [3]. Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee.
  • [4]. Risk-Averse Bayes-Adaptive Reinforcement Learning.
  • [5]. Offline Reinforcement Learning as One Big Sequence Modeling Problem.
  • [6]. Distributional Reinforcement Learning for Multi-Dimensional Reward Functions.
  • [7]. A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning.
  • [8]. Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation.
  • [9]. There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning.
  • [10]. Reinforcement Learning in Reward-Mixing MDPs.
  • [11]. Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning.
  • [12]. On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning.
  • [13]. Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks.
  • [14]. On the Theory of Reinforcement Learning with Once-per-Episode Feedback.
  • [15]. On Effective Scheduling of Model-based Reinforcement Learning.
  • [16]. Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization.
  • [17]. Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration.
  • [18]. Information Directed Reward Learning for Reinforcement Learning.
  • [19]. Celebrating Diversity in Shared Multi-Agent Reinforcement Learning.
  • [20]. Towards Instance-Optimal Offline Reinforcement Learning with Pessimism.
  • [21]. Environment Generation for Zero-Shot Compositional Reinforcement Learning.
  • [22]. Offline Meta Reinforcement Learning - Identifiability Challenges and Effective Data Collection Strategies.
  • [23]. PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning.
  • [24]. Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation.
  • [25]. Automatic Data Augmentation for Generalization in Reinforcement Learning.
  • [26]. RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem.
  • [27]. Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning.
  • [28]. Bellman-consistent Pessimism for Offline Reinforcement Learning.
  • [29]. Teachable Reinforcement Learning via Advice Distillation.
  • [30]. Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees.
  • [31]. Online Robust Reinforcement Learning with Model Uncertainty.
  • [32]. Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble.
  • [33]. A Provably Efficient Sample Collection Strategy for Reinforcement Learning.
  • [34]. Near-Optimal Offline Reinforcement Learning via Double Variance Reduction.
  • [35]. Multi-Agent Reinforcement Learning in Stochastic Networked Systems.
  • [36]. When Is Generalizable Reinforcement Learning Tractable?
  • [37]. Learning Markov State Abstractions for Deep Reinforcement Learning.
  • [38]. Towards Deeper Deep Reinforcement Learning with Spectral Normalization.
  • [39]. Adversarial Intrinsic Motivation for Reinforcement Learning.
  • [40]. Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning.
  • [41]. TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning.
  • [42]. Model-Based Reinforcement Learning via Imagination with Derived Memory.
  • [43]. Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning.
  • [44]. Compositional Reinforcement Learning from Logical Specifications.
  • [45]. Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning.
  • [46]. Local Differential Privacy for Regret Minimization in Reinforcement Learning.
  • [47]. Continuous Doubly Constrained Batch Reinforcement Learning.
  • [48]. Conservative Data Sharing for Multi-Task Offline Reinforcement Learning.
  • [49]. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism.
  • [50]. A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning.
  • [51]. Optimization-Based Algebraic Multigrid Coarsening Using Reinforcement Learning.
  • [52]. EDGE: Explaining Deep Reinforcement Learning Policies.
  • [53]. Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning.
  • [54]. Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning.
  • [55]. Pretraining Representations for Data-Efficient Reinforcement Learning.
  • [56]. Tactical Optimism and Pessimism for Deep Reinforcement Learning.
  • [57]. Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning.
  • [58]. Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings.
  • [59]. Outcome-Driven Reinforcement Learning via Variational Inference.
  • [60]. Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning.
  • [61]. Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints.
  • [62]. Heuristic-Guided Reinforcement Learning.
  • [63]. Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning.
  • [64]. Safe Reinforcement Learning with Natural Language Constraints.
  • [65]. Safe Reinforcement Learning by Imagining the Near Future.
  • [66]. Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation.
  • [67]. MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement Learning Agents.
  • [68]. PettingZoo: Gym for Multi-Agent Reinforcement Learning.
  • [69]. Decision Transformer: Reinforcement Learning via Sequence Modeling.
  • [70]. Nearly Horizon-Free Offline Reinforcement Learning.
  • [71]. Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes.
  • [72]. Contrastive Reinforcement Learning of Symbolic Reasoning Domains.
  • [73]. Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection.
  • [74]. Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting.
  • [75]. Scalable Online Planning via Reinforcement Learning Fine-Tuning.
  • [76]. An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning.
  • [77]. Risk-Aware Transfer in Reinforcement Learning using Successor Features.
  • [78]. Regret Minimization Experience Replay in Off-Policy Reinforcement Learning.
  • [79]. Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning.
  • [80]. A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning.
  • [81]. Autonomous Reinforcement Learning via Subgoal Curricula.
  • [82]. PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators.
  • [83]. Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning.
  • [84]. Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations.
  • [85]. Functional Regularization for Reinforcement Learning via Learned Fourier Features.
  • [86]. Agent Modelling under Partial Observability for Deep Reinforcement Learning.
  • [87]. Conservative Offline Distributional Reinforcement Learning.
  • [88]. Learning Tree Interpretation from Object Representation for Deep Reinforcement Learning.
  • [89]. Explicable Reward Design for Reinforcement Learning Agents.
  • [90]. A Minimalist Approach to Offline Reinforcement Learning.
  • [91]. BCORLE(λ): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market.
  • [92]. Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning.
  • [93]. Reinforcement Learning based Disease Progression Model for Alzheimer’s Disease.
  • [94]. Accelerating Quadratic Optimization with Reinforcement Learning.
  • [95]. Provably Efficient Causal Reinforcement Learning with Confounded Observational Data.
  • [96]. Hierarchical Reinforcement Learning with Timed Subgoals.
  • [97]. Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives.
  • [98]. Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation.
  • [99]. Reinforcement Learning in Newcomblike Environments.
  • [100]. Reinforcement Learning with Latent Flow.
  • [101]. Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs.
  • [102]. Reinforcement Learning Enhanced Explainer for Graph Neural Networks.
  • [103]. The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning.
  • [104]. Causal Influence Detection for Improving Efficiency in Reinforcement Learning.
  • [105]. Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model.
  • [106]. RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents.
  • [107]. The Difficulty of Passive Learning in Deep Reinforcement Learning.
  • [108]. A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems.
  • [109]. Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding.
  • [110]. Machine versus Human Attention in Deep Reinforcement Learning Tasks.
  • [111]. Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration.
  • [112]. Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations.
  • [113]. A Max-Min Entropy Framework for Reinforcement Learning.
  • [114]. Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch.
  • [115]. Robust Deep Reinforcement Learning through Adversarial Loss.
  • [116]. Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature.
  • [117]. Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings.
  • [118]. Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning.
  • [119]. Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning.
  • [120]. Online and Offline Reinforcement Learning by Planning with a Learned Model.
  • [121]. Variational Bayesian Reinforcement Learning with Regret Bounds.
  • [122]. Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning.
  • [123]. Parametrized Quantum Policies for Reinforcement Learning.
  • [124]. On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations.
  • [125]. Continual World: A Robotic Benchmark For Continual Reinforcement Learning.
  • [126]. Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning.
  • [127]. Deep Reinforcement Learning at the Edge of the Statistical Precipice.
  • [128]. Offline Reinforcement Learning with Reverse Model-based Imagination.
  • [129]. Program Synthesis Guided Reinforcement Learning for Partially Observed Environments.
  • [130]. Structural Credit Assignment in Neural Networks using Reinforcement Learning.

相关内容

热门资讯

志愿者活动心得体会 志愿者活动心得体会500字(通用6篇)  当我们经过反思,对生活有了新的看法时,马上将其记录下来,这...
教育名著读书心得 教育名著读书心得(通用29篇)  当我们积累了新的体会时,就十分有必须要写一篇心得体会,这样能够给人...
洛阳实习报告 洛阳实习报告  一段充实而忙碌的实习生活结束了,相信你积累了不少实习心得,让我们一起来学习写实习报告...
《小英雄雨来》读书笔记 《小英雄雨来》读书笔记45篇  读完一本书以后,大家心中一定是萌生了不少心得,何不写一篇读书笔记记录...
团支部个人心得体会 团支部个人心得体会范文(精选5篇)  当我们积累了新的体会时,不如来好好地做个总结,写一篇心得体会,...
劳动节活动心得体会 劳动节活动心得体会(通用5篇)  当在某些事情上我们有很深的体会时,可用写心得体会的方式将其记录下来...
护理学专业的心得体会 护理学专业的心得体会(通用13篇)  在平日里,心中难免会有一些新的想法,马上将其记录下来,这样有利...
工作心得体会感悟 工作心得体会感悟(通用18篇)  从某件事情上得到收获以后,往往会写一篇心得体会,如此就可以提升我们...
采购课程培训心得体会 采购课程培训心得体会范文(通用13篇)  我们心里有一些收获后,可以通过写心得体会的方式将其记录下来...
疫情期间做社区志愿服务心得 疫情期间做社区志愿服务心得  有了一些收获以后,写一篇心得体会,记录下来,这样可以帮助我们分析出现问...
被隔离人员心得体会 被隔离人员心得体会  我们有一些启发后,将其记录在心得体会里,让自己铭记于心,这么做可以让我们不断思...
从优秀到卓越读书心得 从优秀到卓越读书心得(通用18篇)  当阅读了一本名著后,大家心中一定有不少感悟,是时候静下心来好好...
小学生读书心得体会 小学生读书心得体会范文(精选10篇)  当我们受到启发,对生活有了新的感悟时,不妨将其写成一篇心得体...
教师学习心得体会 实用的教师学习心得体会(通用15篇)  我们得到了一些心得体会以后,往往会写一篇心得体会,它可以帮助...
档案管理的心得体会 档案管理的心得体会(精选11篇)  我们得到了一些心得体会以后,可以记录在心得体会中,这样我们就可以...
超市社会实践心得体会 超市社会实践心得体会范文(精选14篇)  我们在一些事情上受到启发后,可以寻思将其写进心得体会中,这...
保密工作心得 保密工作心得保密工作心得通过学习中央领导关于保密工作重要指示精神,了解到了保密工作在我们工作中的重要...
作风建设心得体会 作风建设心得体会(精选16篇)  当我们经过反思,有了新的启发时,可以记录在心得体会中,这样可以不断...
参加辅导员培训班心得 参加辅导员培训班心得范文  参加了咱们学校本学期举办的辅导员培训班后,我有所体会,获益匪浅。  很高...
解读《中学教师专业标准》心得 解读《中学教师专业标准》心得  《中学教师专业标准》的基本理念是:  (一)学生为本:尊重中学学生权...