Skip to main content

Welcome

Welcome! If you've got this far it means your ready to start using our systems. This documentation should provide both a high-level business overview, and reference documentation for any business or software developer.

What we do

We offer a federated distribution platform for machine learning agents. Our core interest is ensuring the barriers for you to scale the models you build is not the deployment and distribution steps.

Terminology

info

Trees have branches, branches have leaves.

Keep it simple. You plant a tree, form a branch, and connect your learning agents as the leaves.

Distributed Learning is a method to perform learning in a distributed environment. Importantly Distributed Learning deals with asynchronous learners, on heterogeneous non-Identically and Independently Distributed Data. In some cases these datasets can be unbalanced, sensitive, or large in volume.

How does Federated Learning work?

step1

Many leaf nodes in the image above see their individual datasets. From these many datasets each leaf adjusts a model.

step2

These models are sent to the BranchKey server where various forms of aggregation are performed - the most popular and well known being Federated Averaging.

step3

The result of this aggregation is a model that has been collectively trained across many distributed private datasets, without ever having seen that data.

step4

Where is the value in an aggregated model?

There's value in data. However, in many cases data cannot leave the location where it was generated, this is true for many medical and financial settings. If a pattern is being observed at one location, the data at this location contains the information for modelling. If this pattern of behaviour exists at a new location, this new location will not have the information to recognise this pattern.

By learning across agents, we provide a model that learns out of domain information by collaborating with learners in your federated pools. The models we produce are shown to outperform an individual learner with full access.

Data access

Access to data is important to model multiple scenarios and is a requirement for improving your models.

The following scenarios are a good fit for distributed learning:

  • Privacy: Keeping private data secure is legally, ethically, and technically good practise.
  • Logistics: Large volumes of data do not need to be transmitted for processing. This costs you money, energy is needed to transfer this data, bandwidth capacities are heterogeneous and centralisation of compute is expensive.
  • Scalability: Parallelisation of compute allows for larger datasets to be learnt faster and computational costs shared.