About Me

I am Director of Engineering at Uber, where I lead the Marketplace Simulation & Planning (MSP) 50+ people organization that is responsible for planning pricing and incentives for riders & drivers in Uber’s Mobility business. My organization interfaces directly with Uber's Operations and Finance business functions and innovates in a range of Engineering & Science disciplines to determine prices and incentives globally across Uber's Mobility Marketplace: Deep Learning for forecasting, Causal Inference & Causal Machine Learning for pricing and incentive treatment effect estimation, Operations Research for large-scale optimization across the cities Uber operates in, Reinforcement Learning for long-term sequential decision making as well as Backend, ML Infrastructure & Data Engineering at scale.

Previously, I was Head of the Home Personalization Engineering organization at Spotify that is responsible for generating, ranking and distributing personalized content recommendations across music, podcasts and audiobooks on the Spotify’s homepage of 650+ million listeners. I also served as the Strategy Lead of Spotify's company bet on content discovery. Prior to Spotify, I was at Netflix, first driving causal ML initiatives and subsequently building and leading the Adaptive Experimentation working group, which was a cross-functional team of researchers, engineers and data scientists focused on delivering new experimentation capabilities at Netflix. Prior to Netflix, I did a PhD on reinforcement learning and causal inference at Stanford where I was advised by Benjamin Van Roy and Susan Athey and I worked at Google Research on the design and deployment of large-scale optimization algorithms for Google Technical Infrastructure and Google Ad Exchange.

Select honors include Forbes 30 under 30 Greece for industry and academic impact, Stanford's Outstanding Academic Achievement Award, Intel's Innovation Award and Google's Anita Borg Memorial Award.

I enjoy tennis, swimming, traveling around the world and exploring impressionist & surrealist art.

Research

Concurrent Learning with Aggregated States via Randomized Least Squares Value Iteration

Chen, Bai, Zhang, Dong, Dimakopoulou, Sun, Zhou (ICML 2025)

Designing learning agents that explore efficiently in a complex environment has been widely recognized as a fundamental challenge in reinforcement learning. While a number of works have demonstrated the effectiveness of techniques based on randomized value functions on a single agent, it remains unclear, from a theoretical point of view, whether injecting randomization can help a society of agents concurently explore an environment. The theoretical results established in this work tender an affirmative answer to this question. We adapt the concurrent learning framework to randomized least-squares value iteration (RLSVI) with aggregated state representation. We demonstrate polynomial worst-case regret bounds in both finite- and infinite-horizon environments. Our algorithm exhibits significantly lower space complexity compared to prior contributions in this domain, both in theory and in numerical experiments.
[Paper]

Calibrated Recommendations with Contextual Bandits

Feijer, Abdollahpouri, Gupta, Clare, Wen, Wasson, Dimakopoulou, Nazari, Kretschman, Lalmas (RecSys CONSEQUENCES 2025)

Spotify’s Home page features a variety of content types, including music, podcasts, and audiobooks. However, historical data is heavily skewed toward music, making it challenging to deliver a balanced and personalized content mix. Moreover, users’ preference towards different content types may vary depending on the time of day, the day of week, or even the device they use. We propose a calibration method that leverages contextual bandits to dynamically learn each user’s optimal content type distribution based on their context and preferences. Unlike traditional calibration methods that rely on historical averages, our approach boosts engagement by adapting to how users interests in different content types varies across contexts. Both offline and online results demonstrate improved precision and user engagement with the Spotify Home page, in particular with under-represented content types such as podcasts.
[Paper]

Evaluating the Surrogate Index as a Decision-Making Tool

Zhang, Zhao, Le, Dimakopoulou, Kallus (CODE 2023)

Surrogate index approaches have recently become a popular method of estimating longer-term impact from shorter-term outcomes. In this paper, we leverage 1098 test arms from 200 A/B tests at Netflix to empirically investigate to what degree would decisions made using a surrogate index utilizing 14 days of data would align with those made using direct measurement of day 63 treatment effects. Focusing specifically on linear “auto-surrogate” models that utilize the shorter-term observations of the long-term outcome of interest, we find that the statistical inferences that we would draw from using the surrogate index are ~95% consistent with those from directly measuring the long-term treatment effect. Moreover, when we restrict ourselves to the set of tests that would be “launched” (i.e. positive and statistically significant) based on the 63-day directly measured treatment effects, we find that relying instead on the surrogate index achieves 79% and 65% recall.
[Paper]

Calibrated Recommendations as a Minimum-Cost Flow Problem

Abdollahpouri, Nazari, Gain, Gibson, Dimakopoulou, Anderton, Carterette, Lalmas, Jebara (WSDM 2023)

In the recommended list of items, calibration ensures that the various (past) areas of interest of a user are reflected with their corresponding proportions. For instance, if a user has watched, say, 80 romance movies and 20 action movies, then it is reasonable to expect the recommended list of movies to be comprised of about 80% romance and 20% action movies as well. Calibration is particularly important given that optimizing towards accuracy often leads to the user’s minority interests being dominated by their main interests, or by a few overall popular items, in the recommendations they receive. In this paper, we propose a novel approach based on the minimum-cost flow problem for generating calibrated recommendations. In a series of experiments using two publicly available datasets, we demonstrate the superior performance of our proposed approach compared to the state-of-the-art in generating relevant and calibrated recommendation lists.
[Paper]

Society of Agents: Regrets Bounds of Concurrent Reinforcement Learning

Chen, Dong, Bai, Dimakopoulou, Xu, Zhou (NeurIPS 2022)

We consider the concurrent reinforcement learning problem where multiple agents simultaneously learn to make decisions in the same environment by sharing experience with each other. Existing works in this emerging area have empirically demonstrated that Thompson sampling based algorithms provide a particularly attractive alternative for inducing cooperation, because each agent can independently sample a belief environment (and compute a corresponding optimal policy) from the joint posterior computed by aggregating all agents’ data , which induces diversity in exploration among agents while benefiting shared experience from all agents. However, theoretical guarantees in this area remain under-explored. In this paper, we fill in this gap by considering two settings; the finite-horizon episodic RL setting and the infinite-horizon RL problem. For both settings, we establish a per-agent regret bound that decreases at an optimal rate of Θ(1/√Ν), which manifests the power of cooperation in concurrent RL.
[Paper]

Online Multi-Armed Bandits with Adaptive Inference

Dimakopoulou, Ren, Zhou (NeurIPS 2021)

During online decision making in Multi-Armed Bandits, one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step. Our thesis in this paper is that more sophisticated inference schemes that take into account the adaptive nature of the sequentially collected data can unlock further performance gains, even though both UCB and TS type algorithms are optimal in the worst case. In particular, we propose a variant of TS-style algorithms–which we call doubly adaptive TS–that leverages recent advances in causal inference and adaptively reweights the terms of a doubly robust estimator on the true mean reward of each arm. Through 20 synthetic domain experiments and a semi-synthetic experiment based on data from an A/B test of a web service, we demonstrate that using an adaptive inferential scheme (while still retaining the exploration efficacy of TS) provides clear benefits in online decision making. We also provide a finite-time regret bound of doubly adaptive TS that matches (up to log factors) those of UCB and TS.
[Paper]

Post-Contextual-Bandit Inference

Bibaut, Dimakopoulou, Kallus, Chambaz, van der Laan (NeurIPS 2021)

The adaptive nature of the data collected by contextual bandit algorithms, however, makes inference difficult: standard estimators are no longer asymptotically normally distributed and classic confidence intervals fail to provide correct coverage. While this has been addressed in non-contextual settings by using stabilized estimators, the contextual setting poses unique challenges that we tackle for the first time in this paper. We propose the Contextual Adaptive Doubly Robust (CADR) estimator, the first estimator for policy value that is asymptotically normal under contextual adaptive data collection. The main technical challenge in constructing CADR is designing adaptive and consistent conditional standard deviation estimators for stabilization. Extensive numerical experiments using 57 OpenML datasets demonstrate that confidence intervals based on CADR uniquely provide correct coverage.
[Paper]

Risk Minimization from Adaptively Collected Data

Bibaut, Kallus, Dimakopoulou, Chambaz, van der Laan (NeurIPS 2021)

Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class and provide first-of-their-kind generalization guarantees and fast convergence rates. Our results are based on a new maximal inequality that leverages the importance sampling structure to obtain rates with the right dependence on the exploration rate. For regression, we provide fast rates that leverage the strong convexity of squared-error loss. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero, as is the case for bandit-collected data. An empirical investigation validates our theory.
[Paper]

Sequential Causal Inference in a Single World of Connected Units

Bibaut, Petersen, Vlassis,Dimakopoulou, van der Laan

We consider adaptive designs for a trial involving multiple individuals, where the variables of one individual depend on its past and on the past of other individuals. Our goal is to learn a mean outcome, averaged across the individuals, that we would observe if we carried out a given sequence of counterfactual interventions for several time steps. We show how to identify a statistical parameter that equals this mean counterfactual outcome, and how to perform inference for this parameter, while adaptively learning an oracle design defined as a parameter of the true data generating distribution.
[Paper]

Doubly Robust Off-Policy Evaluation with Shrinkage

Su, Dimakopoulou, Krishnamurthy, Dudik (ICML 2020)

We design a new family of estimators for off-policy evaluation in contextual bandits. Our estimators are based on the asymptotically optimal approach of doubly robust estimation, but they shrink importance weights to obtain a better bias-variance tradeoff in finite samples. Our approach adapts importance weights to the quality of a reward predictor, interpolating between doubly robust estimation and direct modeling. When the reward predictor is poor, we recover previously studied weight clipping, but when the reward predictor is good, we obtain a new form of shrinkage. Extensive experiments on bandit benchmark problems show that our estimators are highly adaptive and typically outperform state-of-the-art methods.
[Paper]

ADMM SLIM: Sparse Recommendations for Many Users

Steck, Dimakopoulou, Riabov, Jebara (WSDM 2020)

The Sparse Linear Method (SLIM) is a well-established approach for top-N recommendations. This article proposes several improvements to SLIM that are enabled by the Alternating Directions Method of Multipliers (ADMM). We evaluate our approach against the original SLIM and other state-of-the-art approaches on three well-known data-sets. We find that not only our approach reduces training time considerably but also achieves an up to 25% improvement in recommendation accuracy due to better optimization. We compare the approaches in experiments that simulate scenarios of cold-starting and large catalog sizes compared to a small user base.
[Paper]

Marginal Posterior Sampling for Slate Bandits

Dimakopoulou, Vlassis, Jebara (IJCAI 2019)

We introduce a new Thompson sampling-based algorithm, called marginal posterior sampling, for online slate bandits, that is characterized by three key ideas. First, it postulates that the slate-level reward is a monotone function of the marginal unobserved rewards of the actions in the slate's slots, which it does not attempt to estimate. Second, it maintains posterior distributions for the marginal reward of each slot's actions rather than a slate-level reward posterior. Third, it optimizes at the slot-level rather than the slate-level, which makes it computationally efficient. Simulation results show substantial advantages of marginal posterior sampling over state-of-the-art alternatives.
[Paper]

Balanced Linear Contextual Bandits

Dimakopoulou, Zhou, Athey, Imbens (AAAI 2019)

Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models. We develop algorithms for contextual bandits that integrate balancing methods from causal inference in their estimation to make it less prone to problems of bias. We prove that our algorithms match the state of the art regret bound guarantees and demonstrate their strong practical advantage on a large number of supervised learning datasets and on synthetic examples that simulate model misspecification and covariate shift.
[Paper] [Poster]

On the Design of Estimators for Bandit Off-Policy Evaluation

Vlassis, Bibaut, Dimakopoulou, Jebara (ICML 2019)

Off-policy evaluation is the problem of estimating the value of a target policy using data collected under a different policy. We describe a framework for designing estimators for bandit off-policy evaluation. Given a base estimator and a parametrized class of control variates, we seek a control variate in that class that reduces the risk of the base estimator. We derive the population risk as a function of the class parameters and we discuss approaches for optimizing this function. We present our main results in the context of multi-armed bandits, and we decribe a contextual bandits estimator that is shown to perform well in multi-class cost-sensitive classification datasets.
[Paper]

Scalable Coordinated Exploration in Concurrent Reinforcement Learning

Dimakopoulou, Osband, Van Roy (NeurIPS 2018)

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on seed sampling (Dimakopoulou and Van Roy, 2018) and randomized value function learning (Osband et al., 2016). We demonstrate that, for simple tabular contexts, the approach is competitive with previously proposed tabular model learning methods (Dimakopoulou and Van Roy, 2018). With a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.
[Paper] [Poster][Demo]

Coordinated Exploration in Concurrent Reinforcement Learning

Dimakopoulou, Van Roy (ICML 2018)

We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment, while sharing data in real-time. We identify three properties that are essential to efficient coordinated exploration: real-time adaptivity to shared observations, commitment to carry through with action sequences that reveal new information, and diversity across learning opportunities pursued by different agents. We demonstrate that optimism-based approaches fall short with respect to diversity, while naive extensions of Thompson sampling lack commitment. We propose seed sampling that offers a general approach to designing effective coordination algorithms for concurrent reinforcement learning and has substantial advantages over alternative exploration schemes.
[Paper] [Demo] [ICML 2018 Slides] [ICML 2018 Video]

Estimation Considerations in Contextual Bandits

Dimakopoulou, Zhou, Athey, Imbens

We study a new consideration for the exploration vs. exploitation framework which is that the way exploration is conducted in the present may affect the bias and variance in the potential outcome model estimation in subsequent stages of learning. We show that contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We propose new contextual bandit designs, combining parametric and nonparametric statistical estimation methods with causal inference methods in order to reduce the estimation bias and provide empirical evidence that guides the choice among the alternatives in different scenarios.
[Paper]

Market-based dynamic service mode switching in wireless networks

Dimakopoulou, Bambos, Valdez-Vivas, Apostolopoulos (PIMRC 2017)

We consider a virtualized wireless networking architecture, where infrastructure access points of different carriers form a marketplace of resources and bid service deals to a mobile device. At each point in time the mobile evaluates the available service deals and dynamically decides which one to accept and use in the next transmission interval. Its objective is to minimize the long term cumulative service cost and latency cost to transmit packets in its buffer. We develop a model of this architecture, which allows for the formulation and computation of the optimal control for the mobile to accept an offered deal amongst many and switch into the corresponding service mode. The performance of the optimal and low-complexity heuristic controls is probed via simulation.
[Paper]

Reliable and Efficient Performance Monitoring in Linux

Dimakopoulou, Eranian, Koziris, Bambos (Supercomputing 2016)

We address a published eratum in the Performance Monitoring Unit (PMU) of Intel Sandy Bridge, Ivy Bridge and Haswell processors with hyper-threading enabled which causes cross hyper-thread hardware counter corruption and may produce unreliable results. We propose a cache-coherence style protocol, which we implement in the Linux kernel to address the issue by introducing cross hyper-thread dynamic event scheduling. Additionally, we improve event scheduling efficiency by introducing a bipartite graph matching algorithm which optimally schedules events onto hardware counters consistently. The improvements have been contributed to the upstream Linux kernel v4.1.
[Paper]

Talks & Workshops

(Aug 10th 2026) Invited speaker at the International Conference on Econometrics and Statistics (EcoSta) 2026.

(May 13th 2026) Invited contributor at the MIT Workshop on "Aligning Superintelligence".

(Nov 13th 2025) Invited contributor at the MIT Workshop on "Evaluation for the Long-Term: Learnings from Industry".

(Oct 25th 2024) Keynote speaker at Industrial Recommendation Systems Workshop of ACM CIKM 2024.

(Aug 14th 2024) Invited speaker at the Bernoulli IMS Seminar of the 2024 World Congress in Probability and Statistics.

(Jun 6th 2024) Invited speaker at Intel 2024 MLCon Conference.

(May 31st 2024) Invited speaker at Stanford 2024 Causal Science Center Conference on Experimentation.

(May 31st 2024) Invited speaker at Netflix 2024 Personalization, Recommendation and Search annual Summit.

(Dec 7th 2023) Invited speaker at Stanford/UCSF Bay Area Tech Economics Seminar.

(Sep 23rd 2022) Co-organizer of RecSys 2022 MORS Workshop on Multi-Objective Recommender Systems.

(Sep 22nd 2022) Co-organizer of RecSys 2022 REVEAL Workshop on Reinforcement Learning for Recommender Systems.

(Mar 17th 2022) Invited speaker at the Women in Data Science (WiDS) Conference.

(Dec 14th 2021) Co-organizer of NeurIPS 2021 Causal Sequential Decisions Workshop.

(Oct 25th 2021) Invited speaker at Hamsa Bastani's class at the Wharton School of the University of Pennsylvania.

(Jul 23rd 2021) Invited speaker and panelist at ICML 2021 RL4RealLife Workshop.

(Jun 11th 2021) Invited speaker at Netflix 2021 Personalization, Ranker and Search (PRS) annual Summit.

(Dec 6th 2020) Invited speaker at the NeurIPS 2020 Expo.

(Sep 26th 2020) Co-organizer of the RecSys 2020 Bandit and Reinforcement Learning from User Interactions (REVEAL) Workshop.

(Jul 13th 2020) Invited speaker at the ICML 2020 Women In Machine Learning workshop giving a talk on "Slate Bandit Online Learning & Off-Policy Evaluation".

(Oct 20th 2019) Invited speaker at INFORMS 2019 "Bandits and Reinforcement Learning" session.

(Sep 20th 2019) Co-organizer of the RecSys 2019 "Bandit and Reinforcement Learning from User Interactions" (REVEAL) Workshop.

(Sep 16th 2019) Invited speaker at DeepMind London.

(Aug 14th 2019) Speaker at IJCAI 2019 "Online Learning" Main Conference Track.

(Jan 29th 2019) Speaker at AAAI 2019 "Machine Learning" Main Conference Track.

(Oct 12th 2018) Invited speaker at Cornell's School of Operations Research and Information Engineering Young Researcher Workshop.

(Jul 24th 2018) Invited speaker at Google NYC Algorithms Seminar.

(Jul 11th 2018) Speaker at ICML 2018 "Reinforcement Learning" Main Conference Track.

(May 25th 2018) Invited speaker at the Netflix Research Seminar.

(Dec 8th 2017) Invited speaker at NeurIPS 2017 "From What If to What's Next: ML and Causal Inference" Workshop.

Get in touch