Split Learning

In this sample, we configure 3 local nodes for collaborative training. The reason for doing it locally is to make it easier to understand how the protocol works and to have access to every node in the network to make the whole process more transparent.

Node configuration

to have 3 nodes participating in split learning, install the Weavechain nodes if not done already, the easiest way is by starting them as a docker, using the following method:

  curl -O https://public.weavechain.com/file/install_nodes.sh
  chmod a+x install_nodes.sh
  ./install_nodes.sh 3

this installer will handle the creation of the directory structure for the 3 nodes. After running the installer, the directory structure will look like this:

.
└── nodes
    ├── weave_node1
    │   ├── config
    │   └── storage
    │       └── files
    ├── weave_node2
    │   ├── config
    │   └── storage
    │       └── files
    └── weave_node3
        ├── config
        └── storage
            └── files

to subsequently start/stop the nodes:

  docker stop weave_node1
  docker start weave_node1
  docker stop weave_node2
  docker start weave_node2
  ...

install a local jupyter jupyter server to connect to the node (if not done already)
run the jupyter installer in the root directory, so that the config directory of the jupyter notebook container is next to the nodes:

.
├── config
└── nodes
    ├── weave_node1
    ...

allow running local docker images by running

  docker run -d -v /var/run/docker.sock:/var/run/docker.sock -p 0.0.0.0:2375:2375 bobrik/socat TCP-LISTEN:2375,fork UNIX-CONNECT:/var/run/docker.sock

to run split learning, we will need to connect to one of the node's API and for that we will use weave_node1, making that node the initiator of the split learning process
to do that, we will need to copy the node's config directory to the jupyter notebook's config directory
run the following from the parent directory:

cp nodes/weave_node1/config/* ./config/

with the config and keys copied into the jupyter container, we can use the API to run split learning

Sample of split learning

In this demo we use 3 nodes for collaborative training using this dataset:

we will preprocess and split the data, so each node has its own part
since we are using 3 nodes, we will have 3 CSV files
we will copy the CSV files into each node's private file storage so they can access it

mkdir nodes/weave_node1/storage/files/private_csv_files
mkdir nodes/weave_node2/storage/files/private_csv_files
mkdir nodes/weave_node3/storage/files/private_csv_files

cp melanoma1.csv nodes/weave_node1/storage/files/private_csv_files/melanoma.csv
cp melanoma2.csv nodes/weave_node2/storage/files/private_csv_files/melanoma.csv
cp melanoma3.csv nodes/weave_node3/storage/files/private_csv_files/melanoma.csv

out of the 3 nodes, one will initiate the training and notify the other 2 about it
we will access the initiator node from the notebook
the other 2 nodes will start their model's training using the data that we copied into their private file storage

Docker images structure

split learning executes similarly to a compute task, here is a sample for that: Confidential Compute Sample
this means that we will use docker images to execute the training
since in the network we have 1 initiator and the others are participants, we will have two docker images:
- one for the participants
- one for the initiator

The participant image

for this sample we can use the following image for the participant: gcr.io/weavechain/split_nn_participant
the participant image will be the docker image each participant uses in a compute task to trigger the training process.

Participant image structure

the steps for every participant within the compute task are the same:

1. Connecting to the node

from weaveapi.records import *
from weaveapi.options import *
from weaveapi.filter import *
from weaveapi.weaveh import *

nodeApi, session = connect_weave_api(None)

2. Fetching the training data

reply = nodeApi.read(
    session,
    ".internal_task_params",
    os.environ["WEAVE_TASKID"],
    None,
    READ_DEFAULT_NO_CHAIN,
).get()
params = reply["data"]

the params contain the data necessary for the training, e.g. scope, table, input/output fields, nr of epochs, etc.
example for fetching and preparing training data:

reply = nodeApi.read(session, scope, table, filter, READ_DEFAULT_NO_CHAIN).get()
trainingData = {
    "data": reply["data"],
    "input_fields": input_fields,
    "output_field": output_field,
    "epochs": epochs,
}

3. Training the model

for example, we assume that we have the model built for the dataset

traindata = TrainingData(data, device, input_fields, output_field)
model = Model()

create and configure a train loader, optimizer, etc.

splitNN = SplitNN(
    [model.client_model, model.server_model], [optimizer, server_optimizer]
)

server_model_state, server_optimizer_state = train(
    splitNN, train_loader, epochs
)

4. Sending the model to the node base64 encoded

we encode the server model from the step before

result = {
    "server_model": server_model_base64,
    "server_optimizer": server_optimizer_base64,
}
weave_task_output(nodeApi, session, json.dumps(result))

These are the summarized steps that a participant's compute task executes. After these steps are finished by the participant during the compute task, the trained model is sent to the initiator.

The initiator image

for this sample, we can use the following image for the initiator: gcr.io/weavechain/split_nn_initiator
Similarly to the participant image, the initiator image will be the docker image the initiator uses to aggregate the models from the participants and train the main model.

Initiator image structure

the steps within the compute task are the following:

1. Connecting to the node

from weaveapi.records import *
from weaveapi.options import *
from weaveapi.filter import *
from weaveapi.weaveh import *

nodeApi, session = connect_weave_api(None)

2. Fetching the models from the participants

reply = nodeApi.read(
    session,
    ".internal_task_params",
    os.environ["WEAVE_TASKID"],
    None,
    READ_DEFAULT_NO_CHAIN,
).get()
params = reply["data"]

models = params["data"]

3. Training the main model

here the training happens by using the models from the participants

4. Sending the results to the initiator node base64 encoded

result = {"server_model": server_model_base64}
weave_task_output(nodeApi, session, json.dumps(result))

These are the summarized steps that the initiator's compute task executes.

Jupyter Notebook example for the API

we will trigger split learning via the API call using the jupyter notebook we configured above
more information about the jupyter server configuration here
after opening the notebook in a browser, create a new notebook in the main directory
the following code segments can be copied into the notebook, each segment in this documentation representing a separate cell in the notebook

1. Connecting to the initiator node

from weaveapi.records import *
from weaveapi.options import *
from weaveapi.filter import *
from weaveapi.weaveh import *

nodeApi, session = connect_weave_api("config/demo_client_local.config")

2. Configuring the params

scope = "private_csv_files"
table = "melanoma"
initiator_image = "gcr.io/weavechain/split_nn_initiator:latest"
participant_image = "gcr.io/weavechain/split_nn_participant :latest"
output_field = "target"
epochs = 10
sources = ["<pubkey1>", "<pubkey2>", ...]
params = { "scope": scope, "table": table, "output_field": output_field, "epochs": epochs }

Notes on parameters:

scope is the scope configured above for the training data
table is the name of the CSV file
initiator_image is the name of the initiator node's docker image used for the training
participant_image is the name of the participant node's docker image used for the training
output_field & epoch are training configs
sources is a list of the participant node's public keys; this can also be configured to be every peer of the initiator using "*"
params are the parameters that the nodes will pass to the docker images

3. Connecting to the participant nodes

before we can trigger the split learning process, we have to connect the participant nodes to the initiator
this is done by an API call to the initiator that updates the connections-related configuration of the node
we will create a json with the connection information of the 2 participant nodes and use that to update the initiator's config:

def add_connections(ip_port):
    connections = { "connections": []}
    for ip in ip_port:
        for port in ip_port[ip]:
            connection = {
                'http': {
                    'host': ip,
                    'port': port,
                    'useHttps': False
                }
            }
            connections["connections"].append(connection)
    
    return connections

using that function we can build the connections:

ip_port = {
    "host.docker.internal": {
        "18082",            # port for weave_node2
        "18083"             # port for weave_node3
    }
}
connections = add_connections(ip_port)

the connections variable contains the json objects representing the connection configuration to the 2 participant nodes, i.e. weave_node2 and weave_node3
we will use this to update the initiator node's config:

path = ["peers", "connections"]
reply = nodeApi.updateConfig(session, json.dumps(path), connections).get()

4. Triggering split learning with the initiator node

now that the two participants are connected to the initiator, we can use initiator to trigger split learning, using the parameters configured above with the following API call:

reply = nodeApi.splitLearn(session, initiator_image, participant_image, SLOptions(False, 120, 2, None, sources, params)).get()
print(reply)

{'res': 'ok', 'target': {'operationType': 'COMPUTE', 'organization': 'weavechain', 'account': 'clientAccount'}, 'data': {'<pubkey1>': {'res': 'ok', 'target': {'operationType': 'COMPUTE', 'organization': 'weavechain', 'account': 'clientAccount'}, 'data': {'output': '{"server_model": "UEsDBBQAA...}}}}}

we can also fetch data lineage for the participants:

reply['data']['<pubkey>']

{
'res': 'ok',
'target': {
    'operationType': 'COMPUTE',
    'organization': 'weavedemo',
    'account': 'weavejTGjcdbUSDtk4Rroiz8qVCSGHo3ejTSZV2wtqRobzNBD'
},
'data': {
    'inputHash': '62ZspnQej3ZUyVBTJXEJHyszrZxCBnAY5Si7P1LT54bM',
    'fees': '{"USDC":0}',
    'writesSignature': 'EmChpWVhPZ2ka8wh1aiwMLfXWQSJjPTxYPbo3zPtxShb8bF6JDrsXkRz5vB4yWNAcMQ37mQs5Wm1FXREh6MKzi4',
    'outputHash': 'BpKDBmiku7eSdFxmztQMSxtYRdAvSFcJw8KXDvK5agJf',
    'outputSignature': '4AoAtD6n19XuMxWnaupAeNVjesgtuVQN9QcctmVHfR724jh4TxNUHsoZ5M6je7a4CVYukDwb8YWjYQTT3KNfxBcX',
    'computeHash': '2h5BQPk2tXaDyeSifJ4aE8AoLFnFhNDK5vYsw6zZYGaf',
    'WEAVE_API_KEY': '97af328d9ae0410daebd32565ca18fdb65b4f2c8220e580d',
    'input': '[{"hash":"bryvIld5WQSYT02rOCVzEh1vmDfqxOCCH9YhkHWmVZg=","scope":"private_csv_files","table":"melanoma"}]',
    'writesHash': '6LfKoQQMqb8fYgA1PwBPKMhFaJ59Fn3DWw6qTRri4zjN',
    'paramsHash': 'HW8t63kVCzdR28fbsoHL6RGe3Tbew8icgGhRQ8WbPJg4',
    'writes': '[]',
    'outputSignatureTs': '3BJbd397vNxYDDKGjQMWH2i6gLM1DR2vA4webNrB4P2CSpNHyeYeBiuSEKgE573rCqx6LzwqKEAiXMyFWEo7w9et',
    'taskId': '8de2735c3b71455f9dbe3150d3a52540',
    'consoleSignature': '4ECkbzB3zFK6tjv5GvyHcQuiwr8JkMXkFtjdjwg3SKvYQJvuPx2zj3RB179231T41vwroDQUKTwV8NUeZsxjHpiX',
    'ts': '1692870682362'
}
}