As the emerging Internet of Things (IoT) brings a massive population of multi-modal sensors in the environment, there is a growing need in developing new Machine Learning (ML) techniques to analyze the data and unleash its power. A data-driven IoT ecosystem forms the basis of Ambient Intelligence, i.e., smart environment that is sensitive to the presence of humans and can ultimately help automate human life. IoT data are highly heterogeneous, involving not only the traditional audio-visual modalities, but also many emerging sensory dimensions that go beyond human perception. The rich IoT sensing paradigms pose vast new challenges and opportunities that call for coordinated research efforts between the ML and IoT communities. On one hand, the IoT data require new ML hardware/software platforms and innovative processing/labeling methods for efficient collection, curation, and analysis. On the other hand, compared with traditional audio/visual/textual data that have been widely studied in ML, the new IoT data often exhibit unique challenges due to the highly heterogeneous modalities, disparate dynamic distributions, sparsity, intensive noise, etc. Besides, the involved rich environment and human interactions pose challenges for privacy and security. All those properties hence require new paradigms of ML based perception and understanding. The objective of this workshop is to bring together leading researchers in the ML/IoT industry and academia to address these challenges. The workshop will also solicit benchmark IoT datasets, as a basis for ML researchers to design and benchmark new modeling and data analytic tools.
Fri 5:30 a.m. - 5:40 a.m.
|
Opening Remarks
SlidesLive Video » TBD |
🔗 |
Fri 5:40 a.m. - 5:45 a.m.
|
Variational Component Decoder for Source Extraction from Nonlinear Mixture
(
Lightning Talk
)
link »
SlidesLive Video » In many practical scenarios of signal extraction from a nonlinear mixture, only one (signal) source is intended to be extracted. However, modern methods involving Blind Source Separation are inefficient for this task since they are designed to recover all sources in the mixture. In this paper, we propose supervised Variational Component Decoder (sVCD) as a method dedicated to extracting a single source from nonlinear mixture. sVCD leverages the sequence-to-sequence (Seq2Seq) translation ability of a specially designed neural network to approximate a nonlinear inverse of the mixture process, assisted by priors of the interested source. In order to maintain the robustness in the face of real-life samples, sVCD combines Seq2Seq with variational inference to form a deep generative model, and it is trained by optimizing a variant of variational bound on the data likelihood concerning only the interested source. We demonstrate that sVCD has superior performance on nonlinear source extraction over a state-of-the-art method on diverse datasets, including artificially generated sequences, radio frequency (RF) sensing data, and electroencephalogram (EEG) results. |
Shujie Zhang · Tianyue Zheng · Zhe Chen · Sinno Pan · Jun Luo 🔗 |
Fri 5:45 a.m. - 5:50 a.m.
|
Multi-Knowledge Fusion Network For Time Series Representation Learning
(
Lightning Talk
)
link »
SlidesLive Video » Forecasting complex dynamical systems such as interconnected sensor networks characterized by high-dimensional multivariate time series(MTS) is of paramount importance for making informed decisions and planning for the future in a broad spectrum of applications. Graph forecasting networks(GFNs) are well-suited for forecasting MTS data that exhibit spatio-temporal dependencies. However, most prior works of GFN-based methods on MTS forecasting rely on domain-expertise knowledge to model the nonlinear dynamics of the system. But neglect the potential to leverage the inherent relational-structural dependencies among time series variables underlying MTS data. Meanwhile, contemporary works attempt to infer the relational structure of the complex dependencies between the variables and simultaneously learn the nonlinear dynamics of the interconnected system but neglect the possibility of incorporating domain-specific prior knowledge to improve forecast accuracy. To this end, we propose a novel hybrid architecture that combines explicit prior knowledge with implicit knowledge of the relational structure within the MTS data. It jointly learns intra-series temporal dependencies and inter-series spatial dependencies by encoding time-conditioned structural spatiotemporal inductive biases to provide more accurate and reliable forecasts. It also models the time-varying uncertainty of the multi-horizon forecasts to support decision-making by providing estimates of predictive uncertainty. The proposed architecture has shown promising results on multiple benchmark datasets and out-performs state-of-the-art forecasting methods by a significant margin. We report and discuss the ablation studies to validate our forecasting architecture. |
Sagar Srinivas Sakhinana · Shivam Gupta · Sudhir Aripirala · Rajat sarkar · Venkataramana Runkana 🔗 |
Fri 5:50 a.m. - 5:55 a.m.
|
AnomalyBERT: Self-Supervised Transformer for Time Series Anomaly Detection using Data Degradation Scheme
(
Lightning Talk
)
link »
SlidesLive Video » Mechanical defects in real situations affect observation values and cause abnormalities in multivariate time series, such as sensor values or network data. To perceive abnormalities in such data, it is crucial to understand the temporal context and interrelation between variables simultaneously. The anomaly detection task for time series, especially for unlabeled data, has been a challenging problem, and we address it by applying a suitable data degradation scheme to self-supervised model training. We define four types of synthetic outliers and propose the degradation scheme in which a portion of input data is replaced with one of the synthetic outliers. Inspired by the self-attention mechanism, we design a Transformer-based architecture to recognize the temporal context and detect unnatural sequences with high efficiency. Our model converts multivariate data points into temporal representations with relative position bias and yields anomaly scores from these representations. Our method, AnomalyBERT, shows a great capability of detecting anomalies contained in complex time series and surpasses previous state-of-the-art methods on five real-world benchmarks. Our code is available at https://github.com/Jhryu30/AnomalyBERT. |
Yungi Jeong · Eunseok Yang · Jung Hyun Ryu · Imseong Park · Myungjoo Kang 🔗 |
Fri 5:55 a.m. - 6:00 a.m.
|
An Efficient Semi-Automated Scheme for LiDAR Annotation and A Benchmark Infrastructure Dataset
(
Lightning Talk
)
link »
SlidesLive Video »
We present an efficient semi-automated annotation tool that automatically annotates LiDAR sequences with tracking algorithms while offering a fully annotated infrastructure LiDAR dataset---FLORIDA (Florida LiDAR-based Object Recognition and Intelligent Data Annotation)---which will be made publicly available. Our advanced annotation tool seamlessly integrates multi-object tracking (MOT), single-object tracking (SOT), and batch editing functionalities. Specifically, we introduce a human-in-the-loop schema where annotations are incrementally added to the training set of MOT and SOT models after being fixed and improved by human annotators. By repeating the process, we significantly increase the overall annotation speed by $3- 4$ times and obtain higher quality annotations than a state-of-the-art annotation tool. The human annotation experiments verify the effectiveness of our annotation tool. In addition, we provide detailed statistics and object detection evaluation results for our benchmark dataset at a busy traffic intersection.
|
Aotian Wu · Pan He · Xiao Li · Ke Chen · Sanjay Ranka · Anand Rangarajan 🔗 |
Fri 6:00 a.m. - 6:05 a.m.
|
NetFlick: Adversarial Flickering Attacks on Deep Learning Based Video Compression
(
Lightning Talk
)
link »
SlidesLive Video » Video compression plays a significant role in IoT devices for the efficient transport of visual data while satisfying all underlying bandwidth constraints. Deep learning-based video compression methods are rapidly replacing traditional algorithms and providing state-of-the-art results on edge devices. However, recently developed adversarial attacks demonstrate that digitally crafted perturbations can break the Rate-Distortion relationship of video compression. In this work, we present a real-world LED attack to target video compression frameworks. Our physically realizable attack, dubbed NetFlick, can degrade the spatio-temporal correlation between successive frames by injecting flickering temporal perturbations. In addition, we propose universal perturbations that can downgrade performance of incoming video without prior knowledge of the contents. Experimental results demonstrate that NetFlick can successfully deteriorate the performance of video compression frameworks in both digital- and physical-settings and can be further extended to attack downstream video classification networks. |
Jung-Woo Chang · Nojan Sheybani · Shehzeen Hussain · Mojan Javaheripi · Seira Hidano · Farinaz Koushanfar 🔗 |
Fri 6:05 a.m. - 6:10 a.m.
|
A NEW FRAMEWORK FOR TRAINING IN-NETWORK LEARNING MODELS OVER DISCRETE CHANNELS
(
Lightning Talk
)
link »
SlidesLive Video » In-network learning (INL) has emerged as a new paradigm in machine learning (ML) that allows multiple nodes to train a joint ML model without sharing the raw data. In INL the nodes jointly construct a hyper-ML model formed of ML sub-model located at each node. These sub-models are trained jointly, without sharing their raw data, using a distributed version of the classical backpropagation technique. A disadvantage of these backpropagation techniques is that when the communication between the nodes is done over discrete channels, the parameters of the sub-models are not updated due to the lack of gradient. In this paper, we present a new framework for training INL models over discreet channels. The framework builds on the straight through gradient estimator by adapting the quantisation points, or codebooks, to the optimisation problem at hand while also compensating for the error introduced by the gradient estimation. We perform experiments showing that our proposed framework can achieve similar performance to models trained on continuous channels, while also significantly reducing the amount of data communicated between nodes. |
Matei Moldoveanu · Abdellatif Zaidi · Abderrezak Rachedi 🔗 |
Fri 6:20 a.m. - 6:55 a.m.
|
Invited Talk by Kristen Grauman
(
Invited Talk
)
SlidesLive Video » TBD |
Kristen Grauman 🔗 |
Fri 6:55 a.m. - 7:20 a.m.
|
A NEW FRAMEWORK FOR TRAINING IN-NETWORK LEARNING MODELS OVER DISCRETE CHANNELS
(
Poster
)
link »
In-network learning (INL) has emerged as a new paradigm in machine learning (ML) that allows multiple nodes to train a joint ML model without sharing the raw data. In INL the nodes jointly construct a hyper-ML model formed of ML sub-model located at each node. These sub-models are trained jointly, without sharing their raw data, using a distributed version of the classical backpropagation technique. A disadvantage of these backpropagation techniques is that when the communication between the nodes is done over discrete channels, the parameters of the sub-models are not updated due to the lack of gradient. In this paper, we present a new framework for training INL models over discreet channels. The framework builds on the straight through gradient estimator by adapting the quantisation points, or codebooks, to the optimisation problem at hand while also compensating for the error introduced by the gradient estimation. We perform experiments showing that our proposed framework can achieve similar performance to models trained on continuous channels, while also significantly reducing the amount of data communicated between nodes. |
Abdellatif Zaidi · Matei Moldoveanu · Abderrezak Rachedi 🔗 |
Fri 6:55 a.m. - 7:20 a.m.
|
AnomalyBERT: Self-Supervised Transformer for Time Series Anomaly Detection using Data Degradation Scheme
(
Poster
)
link »
Mechanical defects in real situations affect observation values and cause abnormalities in multivariate time series, such as sensor values or network data. To perceive abnormalities in such data, it is crucial to understand the temporal context and interrelation between variables simultaneously. The anomaly detection task for time series, especially for unlabeled data, has been a challenging problem, and we address it by applying a suitable data degradation scheme to self-supervised model training. We define four types of synthetic outliers and propose the degradation scheme in which a portion of input data is replaced with one of the synthetic outliers. Inspired by the self-attention mechanism, we design a Transformer-based architecture to recognize the temporal context and detect unnatural sequences with high efficiency. Our model converts multivariate data points into temporal representations with relative position bias and yields anomaly scores from these representations. Our method, AnomalyBERT, shows a great capability of detecting anomalies contained in complex time series and surpasses previous state-of-the-art methods on five real-world benchmarks. Our code is available at https://github.com/Jhryu30/AnomalyBERT. |
Yungi Jeong · Eunseok Yang · Jung Hyun Ryu · Imseong Park · Myungjoo Kang 🔗 |
Fri 6:55 a.m. - 7:20 a.m.
|
Multi-Knowledge Fusion Network For Time Series Representation Learning
(
Poster
)
link »
Forecasting complex dynamical systems such as interconnected sensor networks characterized by high-dimensional multivariate time series(MTS) is of paramount importance for making informed decisions and planning for the future in a broad spectrum of applications. Graph forecasting networks(GFNs) are well-suited for forecasting MTS data that exhibit spatio-temporal dependencies. However, most prior works of GFN-based methods on MTS forecasting rely on domain-expertise knowledge to model the nonlinear dynamics of the system. But neglect the potential to leverage the inherent relational-structural dependencies among time series variables underlying MTS data. Meanwhile, contemporary works attempt to infer the relational structure of the complex dependencies between the variables and simultaneously learn the nonlinear dynamics of the interconnected system but neglect the possibility of incorporating domain-specific prior knowledge to improve forecast accuracy. To this end, we propose a novel hybrid architecture that combines explicit prior knowledge with implicit knowledge of the relational structure within the MTS data. It jointly learns intra-series temporal dependencies and inter-series spatial dependencies by encoding time-conditioned structural spatiotemporal inductive biases to provide more accurate and reliable forecasts. It also models the time-varying uncertainty of the multi-horizon forecasts to support decision-making by providing estimates of predictive uncertainty. The proposed architecture has shown promising results on multiple benchmark datasets and out-performs state-of-the-art forecasting methods by a significant margin. We report and discuss the ablation studies to validate our forecasting architecture. |
Sagar Srinivas Sakhinana · Shivam Gupta · Sudhir Aripirala · Rajat sarkar · Venkataramana Runkana 🔗 |
Fri 6:55 a.m. - 7:20 a.m.
|
NetFlick: Adversarial Flickering Attacks on Deep Learning Based Video Compression
(
Poster
)
link »
Video compression plays a significant role in IoT devices for the efficient transport of visual data while satisfying all underlying bandwidth constraints. Deep learning-based video compression methods are rapidly replacing traditional algorithms and providing state-of-the-art results on edge devices. However, recently developed adversarial attacks demonstrate that digitally crafted perturbations can break the Rate-Distortion relationship of video compression. In this work, we present a real-world LED attack to target video compression frameworks. Our physically realizable attack, dubbed NetFlick, can degrade the spatio-temporal correlation between successive frames by injecting flickering temporal perturbations. In addition, we propose universal perturbations that can downgrade performance of incoming video without prior knowledge of the contents. Experimental results demonstrate that NetFlick can successfully deteriorate the performance of video compression frameworks in both digital- and physical-settings and can be further extended to attack downstream video classification networks. |
Jung-Woo Chang · Nojan Sheybani · Shehzeen Hussain · Mojan Javaheripi · Seira Hidano · Farinaz Koushanfar 🔗 |
Fri 6:55 a.m. - 7:20 a.m.
|
An Efficient Semi-Automated Scheme for LiDAR Annotation and A Benchmark Infrastructure Dataset
(
Poster
)
link »
We present an efficient semi-automated annotation tool that automatically annotates LiDAR sequences with tracking algorithms while offering a fully annotated infrastructure LiDAR dataset---FLORIDA (Florida LiDAR-based Object Recognition and Intelligent Data Annotation)---which will be made publicly available. Our advanced annotation tool seamlessly integrates multi-object tracking (MOT), single-object tracking (SOT), and batch editing functionalities. Specifically, we introduce a human-in-the-loop schema where annotations are incrementally added to the training set of MOT and SOT models after being fixed and improved by human annotators. By repeating the process, we significantly increase the overall annotation speed by $3- 4$ times and obtain higher quality annotations than a state-of-the-art annotation tool. The human annotation experiments verify the effectiveness of our annotation tool. In addition, we provide detailed statistics and object detection evaluation results for our benchmark dataset at a busy traffic intersection.
|
Aotian Wu · Pan He · Xiao Li · Ke Chen · Sanjay Ranka · Anand Rangarajan 🔗 |
Fri 6:55 a.m. - 7:20 a.m.
|
SpectraNet: multivariate forecasting and imputation under distribution shifts and missing data
(
Poster
)
link »
SlidesLive Video » Neural forecasting has become an active research area, with advancements in architectural design steadily improving performance and scalability. Most existing approaches produce forecasts using a fixed parametric function with historical values as inputs. We identify performance limitations of this approach in handling two recurrent challenges in IoT data: distribution shifts and missing data. We propose SpectraNet, a model based on a new paradigm for time-series forecasting. We introduce a latent factor inference method that matches the model's output on past observations. Theoretically motivated as a MAP estimation of the posterior distribution of latent factors, the inference process provides additional flexibility to adjust forecasts based on the latest information. We identify three advantages of our method: (i) SoTA performance with 92% fewer parameters and similar training times; (ii) superior robustness to missing data and distribution shifts; and (iii) capability to simultaneously produce forecasts and interpolate past missing data, unifying imputation and forecasting tasks. |
Cristian Challu · Peihong Jiang · Yingnian Wu · Laurent Callot 🔗 |
Fri 6:55 a.m. - 7:20 a.m.
|
Variational Component Decoder for Source Extraction from Nonlinear Mixture
(
Poster
)
link »
In many practical scenarios of signal extraction from a nonlinear mixture, only one (signal) source is intended to be extracted. However, modern methods involving Blind Source Separation are inefficient for this task since they are designed to recover all sources in the mixture. In this paper, we propose supervised Variational Component Decoder (sVCD) as a method dedicated to extracting a single source from nonlinear mixture. sVCD leverages the sequence-to-sequence (Seq2Seq) translation ability of a specially designed neural network to approximate a nonlinear inverse of the mixture process, assisted by priors of the interested source. In order to maintain the robustness in the face of real-life samples, sVCD combines Seq2Seq with variational inference to form a deep generative model, and it is trained by optimizing a variant of variational bound on the data likelihood concerning only the interested source. We demonstrate that sVCD has superior performance on nonlinear source extraction over a state-of-the-art method on diverse datasets, including artificially generated sequences, radio frequency (RF) sensing data, and electroencephalogram (EEG) results. |
Shujie Zhang · Tianyue Zheng · Zhe Chen · Sinno Pan · Jun Luo 🔗 |
Fri 6:55 a.m. - 7:20 a.m.
|
Coffee Break(Poster)
(
Poster Session
)
|
🔗 |
Fri 7:20 a.m. - 7:55 a.m.
|
Invited Talk by Nicholas Lane
(
Invited Talk
)
SlidesLive Video » TBD |
🔗 |
Fri 7:55 a.m. - 8:30 a.m.
|
Invited Talk by Thomas Ploetz
(
Invited Talk
)
SlidesLive Video » TBD |
🔗 |
Fri 8:30 a.m. - 8:35 a.m.
|
FedConceptEM: Robust Federated Learning Under Diverse Distribution Shifts
(
Lightning Talk
)
link »
SlidesLive Video » Federated Learning (FL) is a machine learning paradigm that protects privacy by keeping client data on edge devices. However, optimizing FL in practice can be challenging due to the diversity and heterogeneity of the learning system. Recent research efforts have aimed to improve the optimization of FL with distribution shifts, but it is still an open problem how to train FL models when multiple types of distribution shifts, i.e., feature distribution skew, label distribution skew, and concept shift occur simultaneously.To address this challenge, we propose a novel algorithm framework, FedConceptEM, for handling diverse distribution shifts in FL. FedConceptEM automatically assigns clients with concept shifts to different models, avoiding the performance drop caused by these shifts. At the same time, clients without concept shifts, even with feature or label skew, are assigned to the same model, improving the robustness of the trained models. Extensive experiments demonstrate that FedConceptEM outperforms other state-of-the-art cluster-based FL methods by a significant margin. |
Yongxin Guo · Xiaoying Tang · Tao Lin 🔗 |
Fri 8:35 a.m. - 8:40 a.m.
|
FedEBA+: Towards Fair and Effective Federated Learning via Entropy-based Model
(
Lightning Talk
)
link »
SlidesLive Video » Ensuring fairness is a crucial aspect of Federated Learning (FL), which enables the model to perform consistently across all clients. However, designing an FL algorithm that simultaneously improves global model performance and promotes fairness remains a formidable challenge, as achieving the latter often necessitates a trade-off with the former.To address this challenge, we propose a new FL algorithm, FedEBA+, which enhances fairness while simultaneously improving global model performance. Our approach incorporates a fair aggregation scheme that assigns higher weights to underperforming clients and a novel model update method for FL. Besides, we show the theoretical convergence analysis and demonstrate the fairness of our algorithm.Experimental results reveal that FedEBA+ outperforms other SOTA fairness FL methods in terms of both fairness and the global model’s performance. |
Lin Wang · Zhichao Wang · Xiaoying Tang 🔗 |
Fri 8:40 a.m. - 8:45 a.m.
|
Centaur: Federated Learning for Constrained Edge Devices
(
Lightning Talk
)
link »
SlidesLive Video » Federated learning (FL) facilitates new applications at the edge, especially for wearable and Internet-of-Thing devices. Such devices capture a large and diverse amount of data, but they have memory, compute, power, and connectivity constraints which hinder their participation in FL. We propose Centaur, a multitier FL framework, enabling ultra-constrained devices to efficiently participate in FL on large neural nets. Centaur combines two major ideas: (i) a "data selection" scheme to choose a portion of samples that accelerates the learning, and (ii) a "partition-based training" algorithm that integrates both constrained and powerful devices owned by the same user. Evaluations, on four benchmark neural nets and three datasets, show that Centaur gains ~10% higher accuracy than local training on constrained devices with ~58% energy saving on average. Our experimental results also demonstrate the superior efficiency of Centaur when dealing with imbalanced data, client participation heterogeneity, and various network connection probabilities. |
Fan Mo · Mohammad Malekzadeh · Soumyajit Chatterjee · Fahim Kawsar · Akhil Mathur 🔗 |
Fri 8:45 a.m. - 8:50 a.m.
|
Encoding Expert Knowledge into Federated Learning Using Weak Supervision
(
Lightning Talk
)
link »
SlidesLive Video » Learning from on-device data has enabled intelligent mobile applications ranging from smart keyboards to apps that predict abnormal heartbeats. However, due to the sensitive nature of this data, expert annotation is seldom available. Consequently, existing federated learning techniques that learn from on-device data are unable to capture expert knowledge via data annotations, mostly relying on unsupervised approaches. In this work, we explore an alternative way to codify expert knowledge: using programmatic weak supervision, a principled framework that leverages labeling functions in order to label vast quantities of data without direct access to the data itself. We introduce Weak Supervision Heuristics for Federated Learning (WSHFL), a method to interactively mine and leverage labeling functions that annotate on-device data in cross-device federated settings. Experiments on two sentiment classification tasks show that WSHFL is both efficient and effective at these tasks. |
Sebastian Caldas · Mononito Goswami · Artur Dubrawski 🔗 |
Fri 8:50 a.m. - 8:55 a.m.
|
Async-HFL: Efficient and Robust Asynchronous Federated Learning in Hierarchical IoT Networks
(
Lightning Talk
)
link »
SlidesLive Video » Federated Learning (FL) has gained increasing interest in recent years as a distributed on-device learning paradigm. However, multiple challenges remain to be addressed for deploying FL in real-world Internet-of-Things (IoT) networks with hierarchies.Although existing works have proposed various approaches to account data heterogeneity, system heterogeneity, unexpected stragglers and scalibility, none of them provides a systematic solution to address all of the challenges in a hierarchical and unreliable IoT network.In this paper, we propose an asynchronous and hierarchical framework Async-HFL for performing FL in a common three-tier IoT network architecture. In response to the largely varied delays, Async-HFL employs asynchronous aggregations at both the gateway and the cloud levels thus avoids long waiting time. To fully unleash the potential of Async-HFL in converging speed under system heterogeneities and stragglers, we design device selection at the gateway level and device-gateway association at the cloud level. Device selection chooses edge devices to trigger local training in real-time while device-gateway association determines the network topology periodically after several cloud epochs, both satisfying the bandwidth limitation.We evaluate Async-HFL's convergence speedup using large-scale simulations based on ns-3 and a network topology from NYCMesh. Our results show that Async-HFL converges 1.08-1.31x faster in wall-clock time and saves up to 21.6% total communication cost compared to state-of-the-art asynchronous FL algorithms (with client selection).We further validate Async-HFL on a physical deployment and observe robust convergence under unexpected stragglers. |
Xiaofan Yu · Lucy Cherkasova · Harshvardhan Harshvardhan · Quanling Zhao · Emily Ekaireb · Xiyuan Zhang · Arya Mazumdar · Tajana Rosing 🔗 |
Fri 8:55 a.m. - 9:00 a.m.
|
SHELL: Simple solution witH ELegant detaiLs to Sub-Nyquist Modulation Recognition
(
Lightning Talk
)
link »
SlidesLive Video » Automatic modulation recognition of sub-Nyquist spectrum sensing is essential to demodulate and process the signals in a spectrum- and energy-efficient Internet of Things system. Motivated by the recent advances of deep learning, a promising direction is to automatically predict the modulation based on data-driven representation instead of hand-crafted features. Specifically, our solution SHELL (Simple solution witH ELegant detaiLs) provides a simple yet effective approach, which is capable of achieving high modulation recognition accuracy without complex network structure. |
Kebin Wu · Yu Tian · Ebtesam Almazrouei · Faouzi Bader 🔗 |
Fri 9:10 a.m. - 10:40 a.m.
|
Async-HFL: Efficient and Robust Asynchronous Federated Learning in Hierarchical IoT Networks
(
Poster
)
link »
Federated Learning (FL) has gained increasing interest in recent years as a distributed on-device learning paradigm. However, multiple challenges remain to be addressed for deploying FL in real-world Internet-of-Things (IoT) networks with hierarchies.Although existing works have proposed various approaches to account data heterogeneity, system heterogeneity, unexpected stragglers and scalibility, none of them provides a systematic solution to address all of the challenges in a hierarchical and unreliable IoT network.In this paper, we propose an asynchronous and hierarchical framework Async-HFL for performing FL in a common three-tier IoT network architecture. In response to the largely varied delays, Async-HFL employs asynchronous aggregations at both the gateway and the cloud levels thus avoids long waiting time. To fully unleash the potential of Async-HFL in converging speed under system heterogeneities and stragglers, we design device selection at the gateway level and device-gateway association at the cloud level. Device selection chooses edge devices to trigger local training in real-time while device-gateway association determines the network topology periodically after several cloud epochs, both satisfying the bandwidth limitation.We evaluate Async-HFL's convergence speedup using large-scale simulations based on ns-3 and a network topology from NYCMesh. Our results show that Async-HFL converges 1.08-1.31x faster in wall-clock time and saves up to 21.6% total communication cost compared to state-of-the-art asynchronous FL algorithms (with client selection).We further validate Async-HFL on a physical deployment and observe robust convergence under unexpected stragglers. |
Xiaofan Yu · Lucy Cherkasova · Harshvardhan Harshvardhan · Quanling Zhao · Emily Ekaireb · Xiyuan Zhang · Arya Mazumdar · Tajana Rosing 🔗 |
Fri 9:10 a.m. - 10:40 a.m.
|
Encoding Expert Knowledge into Federated Learning Using Weak Supervision
(
Poster
)
link »
Learning from on-device data has enabled intelligent mobile applications ranging from smart keyboards to apps that predict abnormal heartbeats. However, due to the sensitive nature of this data, expert annotation is seldom available. Consequently, existing federated learning techniques that learn from on-device data are unable to capture expert knowledge via data annotations, mostly relying on unsupervised approaches. In this work, we explore an alternative way to codify expert knowledge: using programmatic weak supervision, a principled framework that leverages labeling functions in order to label vast quantities of data without direct access to the data itself. We introduce Weak Supervision Heuristics for Federated Learning (WSHFL), a method to interactively mine and leverage labeling functions that annotate on-device data in cross-device federated settings. Experiments on two sentiment classification tasks show that WSHFL is both efficient and effective at these tasks. |
Sebastian Caldas · Mononito Goswami · Artur Dubrawski 🔗 |
Fri 9:10 a.m. - 10:40 a.m.
|
SHELL: Simple solution witH ELegant detaiLs to Sub-Nyquist Modulation Recognition
(
Poster
)
link »
Automatic modulation recognition of sub-Nyquist spectrum sensing is essential to demodulate and process the signals in a spectrum- and energy-efficient Internet of Things system. Motivated by the recent advances of deep learning, a promising direction is to automatically predict the modulation based on data-driven representation instead of hand-crafted features. Specifically, our solution SHELL (Simple solution witH ELegant detaiLs) provides a simple yet effective approach, which is capable of achieving high modulation recognition accuracy without complex network structure. |
Kebin Wu · Yu Tian · Ebtesam Almazrouei · Faouzi Bader 🔗 |
Fri 9:10 a.m. - 10:40 a.m.
|
Centaur: Federated Learning for Constrained Edge Devices
(
Poster
)
link »
Federated learning (FL) facilitates new applications at the edge, especially for wearable and Internet-of-Thing devices. Such devices capture a large and diverse amount of data, but they have memory, compute, power, and connectivity constraints which hinder their participation in FL. We propose Centaur, a multitier FL framework, enabling ultra-constrained devices to efficiently participate in FL on large neural nets. Centaur combines two major ideas: (i) a "data selection" scheme to choose a portion of samples that accelerates the learning, and (ii) a "partition-based training" algorithm that integrates both constrained and powerful devices owned by the same user. Evaluations, on four benchmark neural nets and three datasets, show that Centaur gains ~10% higher accuracy than local training on constrained devices with ~58% energy saving on average. Our experimental results also demonstrate the superior efficiency of Centaur when dealing with imbalanced data, client participation heterogeneity, and various network connection probabilities. |
Fan Mo · Mohammad Malekzadeh · Soumyajit Chatterjee · Fahim Kawsar · Akhil Mathur 🔗 |
Fri 9:10 a.m. - 10:40 a.m.
|
FedEBA+: Towards Fair and Effective Federated Learning via Entropy-based Model
(
Poster
)
link »
Ensuring fairness is a crucial aspect of Federated Learning (FL), which enables the model to perform consistently across all clients. However, designing an FL algorithm that simultaneously improves global model performance and promotes fairness remains a formidable challenge, as achieving the latter often necessitates a trade-off with the former.To address this challenge, we propose a new FL algorithm, FedEBA+, which enhances fairness while simultaneously improving global model performance. Our approach incorporates a fair aggregation scheme that assigns higher weights to underperforming clients and a novel model update method for FL. Besides, we show the theoretical convergence analysis and demonstrate the fairness of our algorithm.Experimental results reveal that FedEBA+ outperforms other SOTA fairness FL methods in terms of both fairness and the global model’s performance. |
Lin Wang · Zhichao Wang · Xiaoying Tang 🔗 |
Fri 9:10 a.m. - 10:40 a.m.
|
FedConceptEM: Robust Federated Learning Under Diverse Distribution Shifts
(
Poster
)
link »
Federated Learning (FL) is a machine learning paradigm that protects privacy by keeping client data on edge devices. However, optimizing FL in practice can be challenging due to the diversity and heterogeneity of the learning system. Recent research efforts have aimed to improve the optimization of FL with distribution shifts, but it is still an open problem how to train FL models when multiple types of distribution shifts, i.e., feature distribution skew, label distribution skew, and concept shift occur simultaneously.To address this challenge, we propose a novel algorithm framework, FedConceptEM, for handling diverse distribution shifts in FL. FedConceptEM automatically assigns clients with concept shifts to different models, avoiding the performance drop caused by these shifts. At the same time, clients without concept shifts, even with feature or label skew, are assigned to the same model, improving the robustness of the trained models. Extensive experiments demonstrate that FedConceptEM outperforms other state-of-the-art cluster-based FL methods by a significant margin. |
Yongxin Guo · Xiaoying Tang · Tao Lin 🔗 |
Fri 9:10 a.m. - 10:40 a.m.
|
NELoRa-Bench: A Benchmark for Neural-enhanced LoRa Demodulation
(
Poster
)
link »
Low-Power Wide-Area Networks (LPWANs) are an emerging Internet-of-Things (IoT) paradigm marked by low-power and long-distance communication. Among them, LoRa is widely deployed for its unique characteristics and open-source technology.By adopting the Chirp Spread Spectrum (CSS) modulation, LoRa enables low signal-to-noise ratio (SNR) communication. Standard LoRa demodulation methods use the dechirp method to condense the power of the whole chirp into a power peak on the frequency domain, in order to provide decoding in low SNR scenarios, and can support communication even when SNR is lower than -15 db.However, the standard demodulation method does not fully exploit the properties of chirp signals, yielding space for improvement in all kinds of communication scenarios. Recently, neural network based methods have been applied on LoRa demodulation and have achieved significant improvements in low SNR demodulation scenarios and has become a new reasearch topic. However, neural network training needs large amounts of data, and the collection of such dataset needs dedicated software and tedious work.To support research in the decoding of LoRa symbols, this paper presents a comprehensive LoRa dataset gathered by real life equipment. The dataset composes of LoRa signals with a spreading factor from 7 to 10, with a total of 27329 symbols. Furthermore, we use this dataset to train a neural network and evaluate its performance in low SNR demodulation scenarios. The results show that the neural-based method achieves 1.84-2.35 dB SNR gain over the baseline. The dataset and code for neural network based LoRa demodulation can be found at https://github.com/daibiaoxuwu/NeLoRa_Dataset. |
Jialuo Du · Yidong Ren · Mi Zhang · Yunhao Liu · Zhichao Cao 🔗 |
Fri 9:10 a.m. - 10:40 a.m.
|
Lunch Break (Poster)
(
Poster Session
)
|
🔗 |
Fri 10:40 a.m. - 10:50 a.m.
|
SpectraNet: multivariate forecasting and imputation under distribution shifts and missing data
(
Oral
)
link »
Neural forecasting has become an active research area, with advancements in architectural design steadily improving performance and scalability. Most existing approaches produce forecasts using a fixed parametric function with historical values as inputs. We identify performance limitations of this approach in handling two recurrent challenges in IoT data: distribution shifts and missing data. We propose SpectraNet, a model based on a new paradigm for time-series forecasting. We introduce a latent factor inference method that matches the model's output on past observations. Theoretically motivated as a MAP estimation of the posterior distribution of latent factors, the inference process provides additional flexibility to adjust forecasts based on the latest information. We identify three advantages of our method: (i) SoTA performance with 92% fewer parameters and similar training times; (ii) superior robustness to missing data and distribution shifts; and (iii) capability to simultaneously produce forecasts and interpolate past missing data, unifying imputation and forecasting tasks. |
Cristian Challu · Peihong Jiang · Yingnian Wu · Laurent Callot 🔗 |
Fri 10:50 a.m. - 11:25 a.m.
|
Building Embodied Autonomous Agents by Ruslan Salakhutdinov
(
Invited Talk
)
SlidesLive Video » TBD |
Russ Salakhutdinov 🔗 |
Fri 11:25 a.m. - 12:00 p.m.
|
Invited Talk by Pradeep Natarajan
(
Invited Talk
)
SlidesLive Video » TBD |
Pradeep Natarajan 🔗 |
Fri 12:00 p.m. - 1:00 p.m.
|
Coffee Break
TBD |
🔗 |
Fri 1:00 p.m. - 1:35 p.m.
|
Invited Talk by Heather Zheng
(
Invited Talk
)
SlidesLive Video » TBD |
🔗 |
Fri 1:35 p.m. - 2:10 p.m.
|
Invited Talk by Eric P. Xing
(
Invited Talk
)
SlidesLive Video » TBD |
Eric P Xing 🔗 |
Fri 2:10 p.m. - 2:20 p.m.
|
NELoRa-Bench: A Benchmark for Neural-enhanced LoRa Demodulation
(
Oral
)
link »
SlidesLive Video » Low-Power Wide-Area Networks (LPWANs) are an emerging Internet-of-Things (IoT) paradigm marked by low-power and long-distance communication. Among them, LoRa is widely deployed for its unique characteristics and open-source technology.By adopting the Chirp Spread Spectrum (CSS) modulation, LoRa enables low signal-to-noise ratio (SNR) communication. Standard LoRa demodulation methods use the dechirp method to condense the power of the whole chirp into a power peak on the frequency domain, in order to provide decoding in low SNR scenarios, and can support communication even when SNR is lower than -15 db.However, the standard demodulation method does not fully exploit the properties of chirp signals, yielding space for improvement in all kinds of communication scenarios. Recently, neural network based methods have been applied on LoRa demodulation and have achieved significant improvements in low SNR demodulation scenarios and has become a new reasearch topic. However, neural network training needs large amounts of data, and the collection of such dataset needs dedicated software and tedious work.To support research in the decoding of LoRa symbols, this paper presents a comprehensive LoRa dataset gathered by real life equipment. The dataset composes of LoRa signals with a spreading factor from 7 to 10, with a total of 27329 symbols. Furthermore, we use this dataset to train a neural network and evaluate its performance in low SNR demodulation scenarios. The results show that the neural-based method achieves 1.84-2.35 dB SNR gain over the baseline. The dataset and code for neural network based LoRa demodulation can be found at https://github.com/daibiaoxuwu/NeLoRa_Dataset. |
Jialuo Du · Yidong Ren · Mi Zhang · Yunhao Liu · Zhichao Cao 🔗 |
Fri 2:20 p.m. - 2:30 p.m.
|
Closing Remarks
TBD |
🔗 |