Split Learning

In this sample, we configure 3 local nodes for collaborative training. The reason for doing it locally is to make it easier to understand how the protocol works and to have access to every node in the network to make the whole process more transparent.

Node configuration

  • to have 3 nodes participating in split learning, install the Weavechain nodes if not done already, the easiest way is by starting them as a docker, using the following method:
  curl -O https://public.weavechain.com/file/install_nodes.sh
  chmod a+x install_nodes.sh
  ./install_nodes.sh 3
  • this installer will handle the creation of the directory structure for the 3 nodes. After running the installer, the directory structure will look like this:
.
└── nodes
    ├── weave_node1
    │   ├── config
    │   └── storage
    │       └── files
    ├── weave_node2
    │   ├── config
    │   └── storage
    │       └── files
    └── weave_node3
        ├── config
        └── storage
            └── files
  • to subsequently start/stop the nodes:
  docker stop weave_node1
  docker start weave_node1
  docker stop weave_node2
  docker start weave_node2
  ...
  • install a local jupyter jupyter server to connect to the node (if not done already)
  • run the jupyter installer in the root directory, so that the config directory of the jupyter notebook container is next to the nodes:
.
├── config
└── nodes
    ├── weave_node1
    ...
  • allow running local docker images by running
  docker run -d -v /var/run/docker.sock:/var/run/docker.sock -p 0.0.0.0:2375:2375 bobrik/socat TCP-LISTEN:2375,fork UNIX-CONNECT:/var/run/docker.sock
  • to run split learning, we will need to connect to one of the node's API and for that we will use weave_node1, making that node the initiator of the split learning process
  • to do that, we will need to copy the node's config directory to the jupyter notebook's config directory
  • run the following from the parent directory:
cp nodes/weave_node1/config/* ./config/
  • with the config and keys copied into the jupyter container, we can use the API to run split learning

Sample of split learning


In this demo we use 3 nodes for collaborative training using this dataset:

  • we will preprocess and split the data, so each node has its own part
  • since we are using 3 nodes, we will have 3 CSV files
  • we will copy the CSV files into each node's private file storage so they can access it
mkdir nodes/weave_node1/storage/files/private_csv_files
mkdir nodes/weave_node2/storage/files/private_csv_files
mkdir nodes/weave_node3/storage/files/private_csv_files

cp melanoma1.csv nodes/weave_node1/storage/files/private_csv_files/melanoma.csv
cp melanoma2.csv nodes/weave_node2/storage/files/private_csv_files/melanoma.csv
cp melanoma3.csv nodes/weave_node3/storage/files/private_csv_files/melanoma.csv
  • out of the 3 nodes, one will initiate the training and notify the other 2 about it
  • we will access the initiator node from the notebook
  • the other 2 nodes will start their model's training using the data that we copied into their private file storage

Docker images structure

  • split learning executes similarly to a compute task, here is a sample for that: Confidential Compute Sample
  • this means that we will use docker images to execute the training
  • since in the network we have 1 initiator and the others are participants, we will have two docker images:
    • one for the participants
    • one for the initiator

The participant image

  • for this sample we can use the following image for the participant: gcr.io/weavechain/split_nn_participant
  • the participant image will be the docker image each participant uses in a compute task to trigger the training process.

Participant image structure

  • the steps for every participant within the compute task are the same:

1. Connecting to the node

from weaveapi.records import *
from weaveapi.options import *
from weaveapi.filter import *
from weaveapi.weaveh import *

nodeApi, session = connect_weave_api(None)

2. Fetching the training data

reply = nodeApi.read(
    session,
    ".internal_task_params",
    os.environ["WEAVE_TASKID"],
    None,
    READ_DEFAULT_NO_CHAIN,
).get()
params = reply["data"]
  • the params contain the data necessary for the training, e.g. scope, table, input/output fields, nr of epochs, etc.
  • example for fetching and preparing training data:
reply = nodeApi.read(session, scope, table, filter, READ_DEFAULT_NO_CHAIN).get()
trainingData = {
    "data": reply["data"],
    "input_fields": input_fields,
    "output_field": output_field,
    "epochs": epochs,
}

3. Training the model

  • for example, we assume that we have the model built for the dataset
traindata = TrainingData(data, device, input_fields, output_field)
model = Model()
  • create and configure a train loader, optimizer, etc.
splitNN = SplitNN(
    [model.client_model, model.server_model], [optimizer, server_optimizer]
)

server_model_state, server_optimizer_state = train(
    splitNN, train_loader, epochs
)

4. Sending the model to the node base64 encoded

  • we encode the server model from the step before
result = {
    "server_model": server_model_base64,
    "server_optimizer": server_optimizer_base64,
}
weave_task_output(nodeApi, session, json.dumps(result))

These are the summarized steps that a participant's compute task executes. After these steps are finished by the participant during the compute task, the trained model is sent to the initiator.

The initiator image

  • for this sample, we can use the following image for the initiator: gcr.io/weavechain/split_nn_initiator
  • Similarly to the participant image, the initiator image will be the docker image the initiator uses to aggregate the models from the participants and train the main model.

Initiator image structure

  • the steps within the compute task are the following:

1. Connecting to the node

from weaveapi.records import *
from weaveapi.options import *
from weaveapi.filter import *
from weaveapi.weaveh import *

nodeApi, session = connect_weave_api(None)

2. Fetching the models from the participants

reply = nodeApi.read(
    session,
    ".internal_task_params",
    os.environ["WEAVE_TASKID"],
    None,
    READ_DEFAULT_NO_CHAIN,
).get()
params = reply["data"]

models = params["data"]

3. Training the main model

  • here the training happens by using the models from the participants

4. Sending the results to the initiator node base64 encoded

result = {"server_model": server_model_base64}
weave_task_output(nodeApi, session, json.dumps(result))

These are the summarized steps that the initiator's compute task executes.

Jupyter Notebook example for the API

  • we will trigger split learning via the API call using the jupyter notebook we configured above
  • more information about the jupyter server configuration here
  • after opening the notebook in a browser, create a new notebook in the main directory
  • the following code segments can be copied into the notebook, each segment in this documentation representing a separate cell in the notebook

1. Connecting to the initiator node

from weaveapi.records import *
from weaveapi.options import *
from weaveapi.filter import *
from weaveapi.weaveh import *

nodeApi, session = connect_weave_api("config/demo_client_local.config")

2. Configuring the params

scope = "private_csv_files"
table = "melanoma"
initiator_image = "gcr.io/weavechain/split_nn_initiator:latest"
participant_image = "gcr.io/weavechain/split_nn_participant :latest"
output_field = "target"
epochs = 10
sources = ["<pubkey1>", "<pubkey2>", ...]
params = { "scope": scope, "table": table, "output_field": output_field, "epochs": epochs }

Notes on parameters:

  • scope is the scope configured above for the training data
  • table is the name of the CSV file
  • initiator_image is the name of the initiator node's docker image used for the training
  • participant_image is the name of the participant node's docker image used for the training
  • output_field & epoch are training configs
  • sources is a list of the participant node's public keys; this can also be configured to be every peer of the initiator using "*"
  • params are the parameters that the nodes will pass to the docker images

3. Connecting to the participant nodes

  • before we can trigger the split learning process, we have to connect the participant nodes to the initiator
  • this is done by an API call to the initiator that updates the connections-related configuration of the node
  • we will create a json with the connection information of the 2 participant nodes and use that to update the initiator's config:
def add_connections(ip_port):
    connections = { "connections": []}
    for ip in ip_port:
        for port in ip_port[ip]:
            connection = {
                'http': {
                    'host': ip,
                    'port': port,
                    'useHttps': False
                }
            }
            connections["connections"].append(connection)
    
    return connections
  • using that function we can build the connections:
ip_port = {
    "host.docker.internal": {
        "18082",            # port for weave_node2
        "18083"             # port for weave_node3
    }
}
connections = add_connections(ip_port)
  • the connections variable contains the json objects representing the connection configuration to the 2 participant nodes, i.e. weave_node2 and weave_node3
  • we will use this to update the initiator node's config:
path = ["peers", "connections"]
reply = nodeApi.updateConfig(session, json.dumps(path), connections).get()

4. Triggering split learning with the initiator node

  • now that the two participants are connected to the initiator, we can use initiator to trigger split learning, using the parameters configured above with the following API call:
reply = nodeApi.splitLearn(session, initiator_image, participant_image, SLOptions(False, 120, 2, None, sources, params)).get()
print(reply)
{'res': 'ok', 'target': {'operationType': 'COMPUTE', 'organization': 'weavechain', 'account': 'clientAccount'}, 'data': {'<pubkey1>': {'res': 'ok', 'target': {'operationType': 'COMPUTE', 'organization': 'weavechain', 'account': 'clientAccount'}, 'data': {'output': '{"server_model": "UEsDBBQAA...}}}}}
  • we can also fetch data lineage for the participants:
reply['data']['<pubkey>']
{
'res': 'ok',
'target': {
    'operationType': 'COMPUTE',
    'organization': 'weavedemo',
    'account': 'weavejTGjcdbUSDtk4Rroiz8qVCSGHo3ejTSZV2wtqRobzNBD'
},
'data': {
    'inputHash': '62ZspnQej3ZUyVBTJXEJHyszrZxCBnAY5Si7P1LT54bM',
    'fees': '{"USDC":0}',
    'writesSignature': 'EmChpWVhPZ2ka8wh1aiwMLfXWQSJjPTxYPbo3zPtxShb8bF6JDrsXkRz5vB4yWNAcMQ37mQs5Wm1FXREh6MKzi4',
    'outputHash': 'BpKDBmiku7eSdFxmztQMSxtYRdAvSFcJw8KXDvK5agJf',
    'outputSignature': '4AoAtD6n19XuMxWnaupAeNVjesgtuVQN9QcctmVHfR724jh4TxNUHsoZ5M6je7a4CVYukDwb8YWjYQTT3KNfxBcX',
    'computeHash': '2h5BQPk2tXaDyeSifJ4aE8AoLFnFhNDK5vYsw6zZYGaf',
    'WEAVE_API_KEY': '97af328d9ae0410daebd32565ca18fdb65b4f2c8220e580d',
    'input': '[{"hash":"bryvIld5WQSYT02rOCVzEh1vmDfqxOCCH9YhkHWmVZg=","scope":"private_csv_files","table":"melanoma"}]',
    'writesHash': '6LfKoQQMqb8fYgA1PwBPKMhFaJ59Fn3DWw6qTRri4zjN',
    'paramsHash': 'HW8t63kVCzdR28fbsoHL6RGe3Tbew8icgGhRQ8WbPJg4',
    'writes': '[]',
    'outputSignatureTs': '3BJbd397vNxYDDKGjQMWH2i6gLM1DR2vA4webNrB4P2CSpNHyeYeBiuSEKgE573rCqx6LzwqKEAiXMyFWEo7w9et',
    'taskId': '8de2735c3b71455f9dbe3150d3a52540',
    'consoleSignature': '4ECkbzB3zFK6tjv5GvyHcQuiwr8JkMXkFtjdjwg3SKvYQJvuPx2zj3RB179231T41vwroDQUKTwV8NUeZsxjHpiX',
    'ts': '1692870682362'
}
}