Publications | Manh Cuong (Marcus) Dao

2025

NeurIPS

ROOT: Rethinking Offline Optimization as Distributional Translation via Probabilistic Bridge

Manh Cuong Dao, The Hung Tran, Phi Le Nguyen, and 2 more authors

Advances in Neural Information Processing Systems, 2025

Spotlight Presentation [3.2%]
Abs PDF Code

Spotlight

This paper studies the black-box optimization task which aims to find the maxima of a black-box function using a static set of its observed input-output pairs. This is often achieved via learning and optimizing a surrogate function with that offline data. Alternatively, it can also be framed as an inverse modeling task that maps a desired performance to potential input candidates that achieve it. Both approaches are constrained by the limited amount of offline data. To mitigate this limitation, we introduce a new perspective that casts offline optimization as a distributional translation task. This is formulated as learning a probabilistic bridge transforming an implicit distribution of low-value inputs (i.e., offline data) into another distribution of high-value inputs (i.e., solution candidates). Such probabilistic bridge can be learned using low- and high-value inputs sampled from synthetic functions that resemble the target function. These synthetic functions are constructed as the mean posterior of multiple Gaussian processes fitted with different parameterizations on the offline data, alleviating the data bottleneck. The proposed approach is evaluated on an extensive benchmark comprising most recent methods, demonstrating significant improvement and establishing a new state-of-the-art performance.
FGCS

Noisy data-based attack: A new type of untargeted attack in Federated Learning and its countermeasures

Manh Cuong Dao, Phi Le Nguyen, Huy Hieu Pham, and 4 more authors

Future Generation Computer Systems, 2025

Abs PDF

Federated Learning (FL) is a distributed learning mechanism that enables multiple clients to collaboratively train a global model (e.g. a neural network) while maintaining the privacy of their data. However, FL is susceptible to adversarial attacks, especially those involving poisoned samples. Despite significant research efforts, adversarial attacks and defenses in FL remain an unresolved issue. In this paper, we unravel a novel type of untargeted attack called the noisy data-based attack, which can evade almost all current state-of-the-art defenses. This attack involves transforming training data into noisy data, resulting in reduced accuracy of the global model. To address this issue, we propose a novel defense mechanism named NDAD, which uses a confidence score to evaluate the accuracy of a local model’s prediction on a client’s data. Our defense comprises two major components: a highly accurate algorithm for identifying malicious clients and an aggregation algorithm that optimizes beneficial knowledge from clients while avoiding the impact of poisoned data. The experimental results demonstrate that our proposed defense can detect malicious clients with a high detection accuracy of over 97%, even in challenging scenarios with a significantly high ratio of malicious clients. Furthermore, our proposed aggregation scheme improves the accuracy of the global model by an average of around 2%–3% and up to 4.75% in the best-case scenario compared to existing defenses on MNIST, CIFAR-10, and HAM10000 datasets.

2024

NeurIPS

Incorporating surrogate gradient norm to improve offline optimization techniques

Manh Cuong Dao, Phi Le Nguyen, Truong Thao Nguyen, and 1 more author

Advances in Neural Information Processing Systems, 2024

Abs PDF Code

Offline optimization has recently emerged as an increasingly popular approach to mitigate the prohibitively expensive cost of online experimentation. The key idea is to learn a surrogate of the black-box function that underlines the target experiment using a static (offline) dataset of its previous input-output queries. Such an approach is, however, fraught with an out-of-distribution issue where the learned surrogate becomes inaccurate outside the offline data regimes. To mitigate this, existing offline optimizers have proposed numerous conditioning techniques to prevent the learned surrogate from being too erratic. Nonetheless, such conditioning strategies are often specific to particular surrogate or search models, which might not generalize to a different model choice. This motivates us to develop a model-agnostic approach instead, which incorporates a notion of model sharpness into the training loss of the surrogate as a regularizer. Our approach is supported by a new theoretical analysis demonstrating that reducing surrogate sharpness on the offline dataset provably reduces its generalized sharpness on unseen data. Our analysis extends existing theories from bounding generalized prediction loss (on unseen data) with loss sharpness to bounding the worst-case generalized surrogate sharpness with its empirical estimate on training data, providing a new perspective on sharpness regularization. Our extensive experimentation on a diverse range of optimization tasks also shows that reducing surrogate sharpness often leads to significant improvement, marking (up to) a noticeable 9.6% performance boost.
ICML

Boosting offline optimizers with surrogate sensitivity

Manh Cuong Dao, Phi Le Nguyen, Thao Nguyen Truong, and 1 more author

International Conference on Machine Learning, 2024

Abs PDF Code

Offline optimization is an important task in numerous material engineering domains where online experimentation to collect data is too expensive and needs to be replaced by an in silico maximization of a surrogate of the black-box function. Although such a surrogate can be learned from offline data, its prediction might not be reliable outside the offline data regime, which happens when the surrogate has narrow prediction margin and is (therefore) sensitive to small perturbations of its parameterization. This raises the following questions: (1) how to regulate the sensitivity of a surrogate model; and (2) whether conditioning an offline optimizer with such less sensitive surrogate will lead to better optimization performance. To address these questions, we develop an optimizable sensitivity measurement for the surrogate model, which then inspires a sensitivity-informed regularizer that is applicable to a wide range of offline optimizers. This development is both orthogonal and synergistic to prior research on offline optimization, which is demonstrated in our extensive experiment benchmark.

2023

ACIIDS

A deep reinforcement learning-based multi-objective optimization for crowdsensing-based air quality monitoring systems

Nam Duong Tran, Manh Cuong Dao , Thanh Hung Nguyen, and 3 more authors

In Asian Conference on Intelligent Information and Database Systems, 2023

Abs PDF

Global air pollution is becoming increasingly severe. In this context, monitoring air quality at all times and locations is necessary. Traditionally, air quality is monitored using stationary monitoring stations. However, this approach has an inherent shortcoming: limited monitoring locations. Crowdsensing-based air monitoring has recently emerged as a promising alternative that expands monitoring coverage in both temporal and spatial dimensions through the collaboration of numerous participants. Typically, participants in crowdsensing systems are compensated for the data they provide. One of the critical challenges in handling a crowdsensing system is minimizing the cost while guaranteeing the quality of the data collected. For crowdsensing-based air monitoring systems, data quality refers to the temporal and spatial coverage corresponding to the locations and times the data was collected. In this study, we propose a solution based on deep reinforcement learning that simultaneously optimizes two goals: maximizing coverage range and minimizing costs. Our proposed solution is one of the first attempts to optimize both of these objectives for crowdsensing-based air monitoring systems. Compared to other algorithms, experimental results indicate that the proposed solution can increase coverage by more than 30% and reduce cost by more than 70%.

2022

PIMRC

Deep reinforcement learning-based charging algorithm for target coverage and connectivity in wrsns

Hung Cuong Nguyen, Manh Cuong Dao , Thanh Trung Nguyen, and 4 more authors

In 2022 IEEE 33rd Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), 2022

Abs PDF

Target coverage and connectivity are two of the most crucial issues in handling wireless sensor networks. However, maintaining these two factors is challenging due to the energy constraint of sensors. To this end, wireless charging has emerged as a promising solution to prolong the sensor’s lifetime. In a wireless charging sensor network, a mobile charger moves around the network, stops at several charging locations and charges the sensor via electromagnetic waves. In this study, we investigate the problem of optimizing the charging location and charging time of the mobile charger to ensure the target coverage and connectivity of the network. Our main idea is to leverage the Deep Reinforcement Learning approach. Specifically, the mobile charger will act as an agent, which receives a state including the energy information of the sensors. The mobile charger then decides the following charging location and charging time using the state information and the knowledge learned in the past. Experimental results have shown that our algorithm can extend the network lifetime (i.e., the time until the network coverage and connectivity are not guaranteed) up to 245.9 times compared to the existing algorithms.