Bridging communities: TensorFlow Federated (TFF) and OpenMined
septembra 26, 2022

Posted by Krzys Ostrowski (Research Scientist), Alex Ingerman (Product Manager), and Hardik Vala (Software Engineer)

Since the announcement of TensorFlow Federated (TFF) on this blog 3.5 years ago, a number of organizations have developed frameworks for Federated Learning (FL). While growing attention to privacy and investments in FL are a welcome trend, one challenge that arises is fragmentation of community and industry efforts, which leads to code duplication and reinvention. One way we can address this as a community is by investing in interoperability mechanisms that could enable our platforms and developers to work together and leverage each other's strengths.

In this context, we’re excited to announce the collaboration between TFF and OpenMined - an OSS community dedicated to development of privacy-preserving technologies. OpenMined’s PySyft framework has attracted a vibrant community of hundreds of OSS contributors, and includes tools and APIs to facilitate containerized deployment and integrations with diverse data sources that complement the capabilities we offer in TFF.

OpenMined is joining Special Interest Group (SIG) Federated (see the charter, forum, meeting notes, and the Discord server) we’ve recently established to enable developers of TFF, together with a growing set of OSS and industry partners, to openly engage in conversations about how to jointly evolve the TFF ecosystem and grow the adoption of FL.

Introducing PySyTFF

To kick off the collaboration, we - the developers of TFF and OpenMined’s PySyft - decided to focus our initial efforts on building together a new platform, with an endearing name PySyTFF, that combines elements of TFF and PySyft to support what we believe will be an increasingly common scenario, illustrated below.

In this scenario, an owner of a sensitive dataset would like to invite researchers to experiment with training and evaluating ML models on their dataset to advance the current understanding of what model architectures, parameters, etc., work best, while protecting the data and adhering to policies that may govern its use. In practice, such scenarios often end up involving negotiating data usage contracts. On the one hand, these can be tedious to set up, and on the other hand, they largely rely on goodwill.

What we’d like instead is, to have a platform that can offer structural safeguards in place that limit the disclosure of sensitive information and ensure policy compliance by construction - this is our goal for PySyTFF.

As an aside, note that even though this blog post is about FL, we aren’t necessarily talking here about scenarios where data is physically siloed across physical locations - the data can also be hosted in a datacenter and logically siloed. More on this below.

Developer experience

The initial proof-of-concept implementation of PySyTFF offers an early glimpse of what the developer experience for the data scientist will look like. Note how we combine the advantages of both frameworks - e.g., TFF’s ability to define models in Keras, and PySyft’s access control mechanism and APIs for data access:

domain = sy.login(email="sam@stargate.net", password="changethis", port=8081)


model_fn = lambda: tf.keras.models.Sequential(...)


params = {

    'rounds': 10,

    'no_clients': 3,

    'noise_multiplier': 0.05,

    'clients_per_round': 2,

    'train_data_id': domain.dataset[0]['images'].id_at_location.to_string(),

    'label_data_id': domain.datasets[0]['labels'].id_at_location.to_string()

}


model, metrics = sy.tff.train_model(model_fn, params, domain, timeout=5000)


Here, the data scientist is logging into a PySyft’s domain node - an infrastructure component provisioned by or on behalf of the data provider - and gains a limited, access control-guarded ability to enumerate the available resources and perform actions on them. This includes obtaining references to datasets managed by the node and their metadata (but not content) and issuing the train_model calls, wherein the data scientist can supply a Keras model they wish to train, and the various parameters that control the training process and affect the privacy guarantees of the computed result, such as the number of rounds or the amount of noise added in order to make the results of the model training more private. In return, the researcher may get computed outputs such as a set of evaluation metrics, or the trained model parameters.

Exactly what ranges of parameters supplied by the researcher are accepted by the platform, and what results the researcher can get will, in general, depend on the policies defined by the data owner that might, e.g., mandate the use of privacy-preserving algorithms and constrain the allowed privacy budget - and these may constrain parameters such as the number of training rounds, clients per round, or the noise multiplier. Whereas at the current stage of development, PySyTFF does not yet offer policy engine integration, this is an important part of the future development plans.

Under the hood

The domain node is a docker-based environment that bundles together a web-based frontend that you can securely log into, with a mechanism for authenticating and authorizing users, and a set of internal services that includes database connectivity, as illustrated below.
The train_model call in the code snippet above, perhaps embedded in the data scientist’s Python colab notebook, is implemented as a network request, carrying a serialized representation of the TensorFlow code of the model to train, along with the training parameters, and the references to the PySyft datasets to use for training and evaluation.

Inside the domain node, the call is relayed to a PySyTFF service, a new component introduced to the PySyft ecosystem to orchestrate the training process. This involves interacting with PySyft’s data backend to obtain handles to shards of user data, calling TFF APIs to construct TFF computations to run, and passing the constructed TFF computations and data handles to an embedded instance of TFF runtime that loads the data using the supplied handles and runs the FL algorithms.

FL on logically-siloed data

At this point, some of you may be wondering how exactly FL fits into the picture. After all, FL is mostly known as a technology that supports computations on data that’s distributed across a set of devices, or (in what’s called a cross-silo flavor of FL) a set of data centers owned by a group of institutions, yet here, we’re talking about a scenario where the data is already in the customer’s PySyft database.

To explain this, let’s pop up a level and consider the high level objective - to enable researchers to perform ML computations on sensitive data with platform-level, structural and formal privacy guarantees. In order to do so, the platform should ideally uphold formal privacy principles, such as data minimization (a guarantee on how the computation is executed and how sensitive data is handled), and anonymous aggregation (a guarantee on what is being computed and released).

Federated Learning is a great fit in this context because it structurally embodies these principles, and provides a framework for implementing algorithms that provably achieve user-level Differential Privacy (DP) - the current gold standard. The FL algorithms that enable us to achieve these guarantees can be used to process data in datacenter deployments, even in scenarios where - as is the case here with the PySyft database - all of that data resides in a single administrative domain.

To see this, just imagine that for each user in the database, we draw a virtual boundary around all their data, and think of it as a kind of virtual silo. We can treat such virtual silos of user data in the same way as how we treat “client” devices in a more traditional FL setting, and orchestrate FL algorithms to run across virtual silos as clients.

Thus, for example, when training an ML model, we’d repeatedly pick sets of users from the database, locally and independently train local model updates on their data - separately for each user, add clipping to each local update and noise for privacy, aggregate these local updates across users to produce an updated global model, and repeat this process for thousands of rounds until the ML model converges, as shown below.
Whereas the data may be only logically partitioned, following this approach enables us to achieve the very same types of formal guarantees, including provable user-level differential privacy, as those cited above - and indeed, TFF enables us to leverage the same FL algorithm implementation - literally the same TFF code - as that which powers Google’s mobile/IoT production deployments.

Collaborate with us!

As noted earlier, the initial version of PySyTFF is still missing a number of components - and this, dear reader, is where you come in. If the vision laid out above excites you, we - the TFF and PySyft teams - would love to work with you to evolve this platform together. In addition to policy engine integration, we plan to augment PySyTFF with the ability to spawn distributed instances of the TFF runtime on cloud or compute clusters to power very compute-intensive workloads, a system of charging for the use of resources, and to extend the scope of PySyTFF to include classical types of cross-silo FL deployments, to name just a few.

There are a great many ways to go about this - from joining the TFF and PySyft’s collaborative efforts and directly helping us build and deploy this platform, to helping design and build generic components and APIs that can enable TFF and PySyft/PyGrid to interoperate.

Ready to get started? You can visit the SIG Federated forum and join the Discord server, or you can reach out directly - see the contact info in the SIG charter, and the engagement channels created by the OpenMined’s PySyft team. We’re looking forward to hearing from you!

Acknowledgments

On behalf of the TFF team at Google, we’d like to thank our OpenMined partners Andrew Trask, Tudor Cebere, and Teo Milea for the productive collaboration leading up to this announcement.

Next post
Bridging communities: TensorFlow Federated (TFF) and OpenMined

Posted by Krzys Ostrowski (Research Scientist), Alex Ingerman (Product Manager), and Hardik Vala (Software Engineer)Since the announcement of TensorFlow Federated (TFF) on this blog 3.5 years ago, a number of organizations have developed frameworks for Federated Learning (FL). While growing attention to privacy and investments in FL are a welcome trend, one challenge that arises is fragmentation …