Connecting a local database

Install docker images

  • Install a Weavechain node locally if not done already, the easiest way is by starting it as a docker
  • Install a local jupyter jupyter server to connect to the node (if not done already)
  • Allow running local docker images by running
  docker run -d -v /var/run/docker.sock:/var/run/docker.sock -p 0.0.0.0:2375:2375 bobrik/socat TCP-LISTEN:2375,fork UNIX-CONNECT:/var/run/docker.sock

Prepare the data

  • go to the folder where the node was installed, download the sample.csv and save it under storage/files/private_files, or do it from command line
  cd storage/files/private_files
  curl -O https://public.weavechain.com/file/sample.csv  

Run the notebook

  • make sure both the node and jupyter server dockers are running
  • connect to the local jupyter server http://127.0.0.1:18888/notebooks/sample-localdb.ipynb
  • use the token taken from the weave_jupyter_public docker logs
  • OR, if you're not using the provided docker server, download the notebook from here and run it in your locally configured jupyter server
  • run the cells one by one, in case of errors check for the docker images running properly and without errors in their logs and for the ports being open
  • contact us on Telegram or via email support@weavechain.com
  • see below a non-interactive output of how the notebook should look like

Sample of expected notebook:


In this demo notebook we will showcase connecting a local database

1. Create an API session

import pandas as pd

from weaveapi.records import *
from weaveapi.options import *
from weaveapi.filter import *
from weaveapi.weaveh import *

WEAVE_CONFIG = "config/demo_client_local.config"
nodeApi, session = connect_weave_api(WEAVE_CONFIG)

data_collection = "localdb"
table = "oncology_data"
{"res":"ok","data":"pong 1674727453166"}

2. Install a local database (if not already having one)

  • it can be any database or file storage from the ones supported
  • for the example we will assume a local postgres server is installed
  • or you can start a new postgres instance as a docker following the "How to use this image" step from here. Sample:
  docker run --name some-postgres -p 0.0.0.0:5432:5432 -e POSTGRES_PASSWORD=mysecretpassword -d postgres

Create a table and populate it with data

  • go to the folder where the node is installed
  • download a sample file
  curl -O https://public.weavechain.com/file/sample.csv
  • start the psql command prompt
  psql -U postgres -d postgres -h localhost -p 5432
  • if psql is not available, install it (see this guide from timescale) or run it from docker
  docker run -it postgres /bin/bash -c "psql -U postgres -d postgres -h host.docker.internal -p 5432"
  • ideally create a new database and user, but to simplify the steps we'll use the default postgres database and user
  • in the psql prompt, create a new table
  CREATE TABLE "oncology_data" ("id" BIGINT NOT NULL,"name" TEXT,"age" NUMERIC,"gender" NUMERIC,"air_pollution" NUMERIC,"alcohol_use" NUMERIC,"dust_allergy" NUMERIC,"occupational_hazards" NUMERIC,"genetic_risk" NUMERIC,"chronic_lung_disease" NUMERIC,"balanced_diet" NUMERIC,"obesity" NUMERIC,"smoking" NUMERIC,"passive_smoker" NUMERIC,"chest_pain" NUMERIC,"coughing_of_blood" NUMERIC,"fatigue" NUMERIC,"weight_loss" NUMERIC,"shortness_of_breath" NUMERIC,"wheezing" NUMERIC,"swallowing_difficulty" NUMERIC,"clubbing_of_fingernails" NUMERIC,"frequent_cold" NUMERIC,"dry_cough" NUMERIC,"snoring" NUMERIC,"level" NUMERIC, CONSTRAINT pk_oncology_data PRIMARY KEY ("id"));
  • and, in the same psql prompt, populate it with data (keep the \ at the beginning)
  \COPY oncology_data FROM 'sample.csv' DELIMITER ',' CSV HEADER;

3. Add a new connection in the configuration file

  • go to the node installation folder and edit config/demo.config
  • add a new item in the databases section
  'localdb': {
    'connectionAdapterType': 'pgsql',
    'replication': {
        'type': 'none',
        'allowedCachingIntervalSec': 604800
    },
    'jdbcConfig': {
      'host': 'host.docker.internal',
      'port': 5432,
      'schema': 'public',
      'database': 'postgres',
      'user': 'postgres',
      'pass': 'mysecretpassword'
    }
  },
  • If the node was started from docker, the address must point to the docker host machine, host.docker.internal will work on MacOS and windows. Use localhost if the node was started as standalone rather than a docker. Or the internal IP obtained via ipconfig/ifconfig. On linux 172.17.0.1 can also be used most of the time (usually assigned for the host machine to be visible from the docker if it's the single docker network interface)

Flag the node to reload the configuration file from disk

  • execute the following command in the notebook
reply = nodeApi.resetConfig(session).get()
print(reply)
{'res': 'ok', 'data': 'weaveconfig/demo.config'}

Restart the node

  • run from the command prompt
  docker stop weave_node
  docker start weave_node

4. Read data from the newly added table

nodeApi, session = connect_weave_api(WEAVE_CONFIG)

reply = nodeApi.read(session, data_collection, table, None, READ_DEFAULT_NO_CHAIN).get()
#print(reply)
df = pd.DataFrame(reply["data"])

df.head()
{"res":"ok","data":"pong 1674727467726"}
id name age gender air_pollution alcohol_use dust_allergy occupational_hazards genetic_risk chronic_lung_disease ... fatigue weight_loss shortness_of_breath wheezing swallowing_difficulty clubbing_of_fingernails frequent_cold dry_cough snoring level
0 1 Lorenzo Rasmussen 33 1 2 4 5 4 3 2 ... 3 4 2 2 3 1 2 3 4 1
1 2 Zechariah Gallegos 17 1 3 1 5 3 4 2 ... 1 3 7 8 6 2 1 7 2 2
2 3 Lukas Jenkins 35 1 4 5 6 5 5 4 ... 8 7 9 2 1 4 6 7 2 3
3 4 Trey Holden 37 1 7 7 7 7 6 7 ... 4 2 3 1 4 5 6 7 5 3
4 5 Branson Rivera 46 1 6 8 7 7 7 6 ... 3 2 4 1 4 2 4 2 3 3

5 rows × 26 columns

5. Make the table private

reply = nodeApi.getTableDefinition(session, data_collection, table).get()
#print(reply)
layout = json.loads(reply["data"])["layout"]
layout["isLocal"] = True
reply = nodeApi.updateLayout(session, data_collection, table, json.dumps({ "layout": layout})).get()
print(reply)
{'res': 'ok', 'target': {'operationType': 'UPDATE_LAYOUT', 'organization': 'weavedemo', 'account': 'weaveyh5R1ytoUCZnr3JjqMDfhUrXwqWC2EWnZX3q7krKLPcg', 'scope': 'localdb', 'table': 'oncology_data'}}

and fail to read the data

  • we expect a Not authorized reply here
reply = nodeApi.read(session, data_collection, table, None, READ_DEFAULT_NO_CHAIN).get()
print(reply)
{'res': 'err', 'target': {'operationType': 'READ', 'organization': 'weavedemo', 'account': 'weaveyh5R1ytoUCZnr3JjqMDfhUrXwqWC2EWnZX3q7krKLPcg', 'scope': 'localdb', 'table': 'oncology_data'}, 'message': 'Not authorized'}

6. Compute a Merkle Tree from a subset of columns in the private table

salt = "salt1234" # Same salt used for records hashes, this can be improved to have different salts for *each distinct writer*

filter = Filter(None, None, None, None, [ "name", "age" ])
reply = nodeApi.merkleTree(session, data_collection, table, filter, salt, READ_DEFAULT_NO_CHAIN).get()
tree = reply["data"]["tree"]
rootHash = reply["data"]["rootHash"]
ts = reply["data"]["timestamp"]
rootHashSignature = reply["data"]["signature"]

print("Generated at", ts)
print("Root Hash", rootHash)
print("Signature", rootHashSignature)
print("")
print(tree[:400] + "..." + tree[-400:])
Generated at 1674727577162
Root Hash iax9Vpupq3bXG3DnBTQyKdWr2s3Uh4Q6qcZAPNVJAgb
Signature 4RdBCQgKWfJqCvD7PrEkYyWSR3KtT8S8eYTdyveDyLFA6RNiwp28R1LvuEtMPAw8ceRzjNddQvQhKFQiRFy9nZoL

iax9Vpupq3bXG3DnBTQyKdWr2s3Uh4Q6qcZAPNVJAgb;SH3Ch1Jwxcs363PwaCYvREgmPVfqr8NcPovZNgXa4js,FE8mTPPJ3uDXTJNxppNa4CeHvfu2Zeo9sjeL71YS12zK;HKgnz9cLoWn4nDapvnGm6K6gH6H5Y2rrgMguaMoKgyek,7s52zKzog488LkgMdEKrkFRoF8opVB9AdrDKtSdSF92j,9tnJTNhSKpZmDViGXtPB2SpoiozEpGkFVNk7FgtLtBQv,A5tNHAHQs61pSzqLJbT9v2ca8ix53kNzzDXKSZ9b6HFy;7oyQ7nq5bLX4J9Es6GQdZcxUpc8FetPZLeyJVFpEnydd,BybqudmZBw54NJhebTWt8yJMf4ar6jnxLRhD2MFsZ7...DhvjP86uY7x4eK7QRrcocBct5XJNGsVUWWREc6zd,E9QLrZqb5M6oXoEQpQ3B5KMvmHJ13Xdb9jUxRgJ3FaJA,F7dfae6vFLs3a2a2w9uET87Euw2uxE2qssvqYcmHPUeP,5Am75Eu1NYQbLuEfTTGDUiH5sWhPcumC3deC4yVYpYkn,HrsDMUrxxgjoXrfeMfSaks8mGK6hbDnyDQjhZwuXnCT7,DL96tTVtfEyRCZCv3EeNd4cStYqX4hR8r5anUi8eAMcC,Gaocem33txtM2KMJbUaMGrEDFrYH1H1oJB25Qb565Ao2,7u4EXgwGPmoqgz6cvAFyfvjAP7cvJcBeHwrK9XpYz9RZ,FpT72W53qoU3YXAZg1DKMCUVSpyLmjA6j52nSXYR9yRn

Check root hash signature

toSign = rootHash + " " + ts
check = nodeApi.verifySignature(rootHashSignature, toSign)
print("Check signature:", check)
Check signature: True

Verify the presence of a known record in the dataset

row = [ 'Lorenzo Rasmussen', 33.0 ]
recordHash = nodeApi.hashRecord(row, salt)
print(recordHash)
ARMe28cMdZvxBCYgBSALyncPnec4ijERrn2cDJgwgEHA
reply = nodeApi.verifyMerkleHash(session, tree, recordHash).get()
print(reply["data"])
true

7. Train a ML model on the private data hosted in our local database

  • we do it similarly how it is done in the Compute Sample, where we also check the model lineage
  • run on the node machine
  docker pull gcr.io/weavechain/oncology_xgboost:latest
  • use latest-arm64 if your machine is ARM
  • the data owner needs to purposely enable running a certain image
  • the node needs to be able to connect to the local docker instance
  • in the default configuration file installed with the node, the sample script is pre-authorized with the following line
  'allowedImages': [ 'gcr.io/weavechain/oncology_xgboost' ]
  • in case of error, uncomment the #print(reply) below to see details
  • (compute to data is just one of the patterns of confidential computing supported, MPC and Homomorphic Encryption could also be used)
reply = nodeApi.compute(session, "gcr.io/weavechain/oncology_xgboost", COMPUTE_DEFAULT).get()
#print(reply)
print(reply["data"]["output"][:1200] + "...")
{"model": "YmluZgAAAD8XAAAAAwAAAAEAAAAAAAAAAQAAAAcAAAABAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOAAAAAAAAAG11bHRpOnNvZnRwcm9iBgAAAAAAAABnYnRyZWUsAQAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAsAAAAAAAAAAAAAABcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP////8BAAAAAgAAAAIAAIAAAJBAAAAAgAMAAAAEAAAACgAAgAAAYEAAAAAABQAAAAYAAAAJAACAAADwQAEAAIAHAAAACAAAABYAAIAAAKBAAQAAAAkAAAAKAAAACQAAgAAAIEACAACA//////////8AAAAA1mVlvgIAAAD//////////wAAAADmFLw+AwAAgP//////////AAAAAEGE5D4DAAAA//////////8AAAAA5RlPvgQAAID//////////wAAAADkGc8+BAAAAP//////////AAAAAG0+Y779nEdD4zjeQ1p2i70AAAAASfgoQ3EcU0P6hyI/AAAAAGRhoUFUVWlDjUE0vwAAAADk9SNC4zgCQycHqz8AAAAAz+gWQhzHoUJlQ/6+AAAAAAAAAACN42RDMio/vwAAAAAAAAAA4ziOQBW8nD8AAAAAAAAAAKqq8kI2br4/AAAAAAAAAADjOA5BlJUsvwAAAAAAAAAA4zgOQZOVrD8AAAAAAAAAAP//j0KwXj2/AAAAAAEAAAAVAAAAAAAAA...