Federated Learning for Privacy-Preserving Data Analytics in Distributed Environments
Abstract
The exponential proliferation of data-generating systems across healthcare networks, financial institutions, IoT infrastructures, and mobile ecosystems has established data as the foundational asset for machine learning-driven decision intelligence. However, centralizing sensitive distributed data for model training introduces critical privacy vulnerabilities, regulatory non-compliance risks under frameworks including GDPR, HIPAA, and LGPD, and prohibitive communication overhead that collectively render conventional centralized machine learning architectures unsuitable for privacy-sensitive distributed environments. Federated learning (FL) has emerged as a transformative distributed computing paradigm enabling collaborative model training across geographically dispersed data holders without transferring raw data, thereby preserving data sovereignty while harnessing the collective intelligence of decentralized datasets. This paper presents a comprehensive review-based and experimental investigation of federated learning for privacy-preserving data analytics across healthcare, financial services, and edge computing domains. A systematic review of 112 peer-reviewed publications (2019–2026) was supplemented by original experimental work at the Distributed AI and Privacy Engineering Laboratory of Kristianstad University, Sweden, where a privacy enhanced federated learning framework integrating Rényi differential privacy (RDP), secure multi-party computation (SMPC), and personalized federated averaging was developed and evaluated on a multi-institutional clinical dataset for pneumonia detection from chest X-rays. The proposed RDP-pFedAvg framework, deployed across 6 simulated hospital nodes spanning heterogeneous non-IID data distributions, achieved a classification AUC-ROC of 0.948 (? = 3.0 Rényi privacy budget)—only 1.8% below the centralized baseline (AUC 0.966) and 12.4% above the local-only training baseline (AUC 0.824)—while providing formally verifiable (?, ?)-Rényi differential privacy guarantees for all participating institutions. The findings demonstrate that federated learning with advanced privacy mechanisms and personalization strategies can achieve near-centralized analytical performance while satisfying the stringent data protection requirements of cross-border distributed environments [1], [2].
KEYWORDS: Federated Learning, Privacy-Preserving Analytics, Rényi Differential Privacy, Secure Multi-Party Computation, Distributed Machine Learning, Personalized Federated Averaging, Healthcare AI, Non-IID Data, GDPR Compliance, Cross-Border Data Governance
Full Text:
PDF 16-30Refbacks
- There are currently no refbacks.