Degree programme
Programme Structure
Show/Search Programme
Degree Programme
Quantitative data
International context
Customized Schedule
Your customized time schedule has been disabled
Search a Professor
Professor's activities
Search a Course
Search a Course (system prior D.M. n. 509)
Search Lessons taught in English
Information on didactic, research and institutional assignments on this page are certified by the University; more information, prepared by the professor, are available on the personal web page and in the curriculum vitae indicated on this webpage.
Information on professor
ProfessorRestelli Marcello
QualificationAssociate professor full time
Belonging DepartmentDipartimento di Elettronica, Informazione e Bioingegneria
Scientific-Disciplinary SectorING-INF/05 - Information Processing Systems
Curriculum VitaeDownload CV (518.02Kb - 01/09/2022)

Professor's office hours
DEI----WednesdayFrom 11:00
To 13:00
4015--Si consiglia di prendere appuntamento via email con il docente
Professor's personal websitehttp://home.deib.polimi.it/restelli/

Data source: RE.PUBLIC@POLIMI - Research Publications at Politecnico di Milano

List of publications and reserach products for the year 2024 (Show all details | Hide all details)
Type Title of the Publicaiton/Product
Journal Articles
A Reinforcement Learning controller optimizing costs and battery State of Health in smart grids (Show >>)

List of publications and reserach products for the year 2023 (Show all details | Hide all details)
Type Title of the Publicaiton/Product
Abstract in Atti di convegno
A Brief Guide to Multi-Objective Reinforcement Learning and Planning JAAMAS track (Show >>)
Conference proceedings
A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning (Show >>)
A Tale of Sampling and Estimation in Discounted Reinforcement Learning (Show >>)
Dynamic Pricing with Volume Discounts in Online Settings (Show >>)
Dynamical Linear Bandits (Show >>)
Information-Theoretic Regret Bounds for Bandits with Fixed Expert Advice (Show >>)
On the Relation between Policy Improvement and Off-Policy Minimum-Variance Policy Evaluation (Show >>)
Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization (Show >>)
Simultaneously Updating All Persistence Values in Reinforcement Learning (Show >>)
Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes (Show >>)
Switching Latent Bandits (Show >>)
Tight Performance Guarantees of Imitator Policies with Continuous Actions (Show >>)
Towards Theoretical Understanding of Inverse Reinforcement Learning (Show >>)
Towards an AI-Based Framework for Autonomous Design and Construction: Learning from Reinforcement Learning Success in RTS Games (Show >>)
Truncating Trajectories in Monte Carlo Reinforcement Learning (Show >>)
Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control (Show >>)
Journal Articles
ARLO: A framework for Automated Reinforcement Learning (Show >>)
An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-MDP (Show >>)
Convex Reinforcement Learning in Finite Trials (Show >>)
Dealer markets: A reinforcement learning mean field game approach (Show >>)
IWDA: Importance Weighting for Drift Adaptation in Streaming Supervised Learning Problems (Show >>)
Optimizing Empty Container Repositioning and Fleet Deployment via Configurable Semi-POMDPs (Show >>)
Risk-averse optimization of reward-based coherent risk measures (Show >>)
The EU-funded I3LUNG Project: Integrative Science, Intelligent Data Platform for Individualized LUNG Cancer Care With Immunotherapy (Show >>)

List of publications and reserach products for the year 2022 (Show all details | Hide all details)
Type Title of the Publicaiton/Product
Abstract in Atti di convegno
Advancing drought monitoring via feature extraction and multi-task learning algorithms (Show >>)
Contributions on scientific books
AI, Machine Learning e Data Mining (Show >>)
Conference proceedings
Addressing Non-Stationarity in FX Trading with Online Model Selection of Offline RL Experts (Show >>)
Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning (Show >>)
Challenging Common Assumptions in Convex Reinforcement Learning (Show >>)
Dark-Pool Smart Order Routing: a Combinatorial Multi-armed Bandit Approach (Show >>)
Delayed Reinforcement Learning by Imitation (Show >>)
Finite Sample Analysis of Mean-Volatility Actor-Critic for Risk-Averse Reinforcement Learning (Show >>)
Goal-Directed Planning via Hindsight Experience Replay (Show >>)
Learning in Markov Games: can we exploit a general-sum opponent? (Show >>)
Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization (Show >>)
Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts (Show >>)
Multi-Fidelity Best-Arm Identification (Show >>)
Off-Policy Evaluation with Deficient Support Using Side Information (Show >>)
Pricing the Long Tail by Explainable Product Aggregation and Monotonic Bandits (Show >>)
Reward-Free Policy Space Compression for Reinforcement Learning (Show >>)
Stochastic Rising Bandits (Show >>)
Storehouse: a Reinforcement Learning Environment for Optimizing Warehouse Management (Show >>)
The Importance of Non-Markovianity in Maximum State Entropy Exploration (Show >>)
Trust Region Meta Learning for Policy Optimization (Show >>)
Unsupervised Reinforcement Learning in Multiple Environments (Show >>)
Journal Articles
A practical guide to multi-objective reinforcement learning and planning (Show >>)
An online state of health estimation method for lithium-ion batteries based on time partitioning and data-driven model identification (Show >>)
Machine Learning Using Real-World and Translational Data to Improve Treatment Selection for NSCLC Patients Treated with Immunotherapy (Show >>)
Online joint bid/daily budget optimization of Internet advertising campaigns (Show >>)
Real-world data to build explainable trustworthy artificial intelligence models for prediction of immunotherapy efficacy in NSCLC patients (Show >>)
Risk-averse policy optimization via risk-neutral policy optimization (Show >>)
Smoothing policies and safe policy gradients (Show >>)

List of publications and reserach products for the year 2021 (Show all details | Hide all details)
Type Title of the Publicaiton/Product
Abstract in Rivista
Abstract PO-065: Artificial intelligence to improve selection for NSCLC patients treated with immunotherapy (Show >>)
Abstract in Atti di convegno
Advancing drought monitoring via feature extraction (Show >>)
The Human Nasal Cavity: Towards the Optimal Surgery with CFD and Machine Learning (Show >>)
Metodo implementato mediante computer per compilazione quantistica in tempo reale basato su intelligenza artificiale (Show >>)
Conference proceedings
Conservative Online Convex Optimization (Show >>)
Exploiting History Data for Nonstationary Multi-armed Bandit (Show >>)
Inferring Functional Properties from Fluid Dynamics Features (Show >>)
Learning FX trading strategies with FQI and persistent actions (Show >>)
Learning a Belief Representation for Delayed Reinforcement Learning (Show >>)
Learning in Non-Cooperative Configurable Markov Decision Processes (Show >>)
Leveraging Good Representations in Linear Contextual Bandits (Show >>)
Meta-Reinforcement Learning by Tracking Task Non-stationarity (Show >>)
Monte carlo tree search for trading and hedging (Show >>)
Newton Optimization on Helmholtz Decomposition for Continuous Games (Show >>)
Policy Optimization as Online Learning with Mediator Feedback (Show >>)
Provably Efficient Learning of Transferable Rewards (Show >>)
Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection (Show >>)
Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning (Show >>)
Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate (Show >>)
Time-Variant Variational Transfer for Value Functions (Show >>)
Journal Articles
A voltage dynamic-based state of charge estimation method for batteries storage systems (Show >>)
Data-driven indicators for the detection and prediction of stuck-pipe events in oil&gas drilling operations (Show >>)
Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems (Show >>)
Gaussian approximation for bias reduction in Q-learning (Show >>)
MushroomRL: Simplifying Reinforcement Learning Research (Show >>)
Policy space identification in configurable environments (Show >>)
Quantum compiling by deep reinforcement learning (Show >>)
Safe policy iteration: A monotonically improving approximate policy iteration approach (Show >>)

List of publications and reserach products for the year 2020 (Show all details | Hide all details)
Type Title of the Publicaiton/Product
Conference proceedings
A Data-Based Approach for the Prediction of Stuck-Pipe Events in Oil Drilling Operations (Show >>)
A Novel Confidence-Based Algorithm for Structured Bandits (Show >>)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits (Show >>)
An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies (Show >>)
Balancing Learning Speed and Stability in Policy Gradient via Adaptive Exploration (Show >>)
Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning (Show >>)
Dealing with Transaction Costs in Portfolio Optimization: Online Gradient Descent with Momentum (Show >>)
Driving exploration by maximum distribution in gaussian process bandits (Show >>)
Fast direct calibration of interest rate derivatives pricing models (Show >>)
Foreign exchange trading: A risk-averse batch reinforcement learning approach (Show >>)
Gradient-Aware Model-Based Policy Search (Show >>)
Inverse Reinforcement Learning from a Gradient-based Learner (Show >>)
Model-Free Non-Stationarity Detection and Adaptation in Reinforcement Learning (Show >>)
Option Hedging with Risk Averse Reinforcement Learning (Show >>)
Risk-Averse Trust Region Optimization for Reward-Volatility Reduction (Show >>)
Sequential Transfer in Reinforcement Learning with a Generative Model (Show >>)
Sharing Knowledge in Multi-Task Deep Reinforcement Learning (Show >>)
Truly Batch Model-Free Inverse Reinforcement Learning about Multiple Intentions (Show >>)
Journal Articles
Combining reinforcement learning with rule-based controllers for transparent and general decision-making in autonomous driving (Show >>)
Importance Sampling Techniques for Policy Optimization (Show >>)
On the use of the policy gradient and Hessian in inverse reinforcement learning (Show >>)
Sliding-Window Thompson Sampling for Non-Stationary Settings (Show >>)
manifesti v. 3.5.13 / 3.5.13
Area Servizi ICT