Installation and usage of the platform

Precise platform configuration: confidential data groups configuration

If you use the privacy API methods to manage confidential data, configure the access to the confidential data in the node configuration file. Use the privacy section for this purpose. In the example below the PostgreSQL database is used:

privacy {

  replier {
    parallelism = 10
    stream-timeout = 1 minute
    stream-chunk-size = 1MiB
  }

  synchronizer {
    request-timeout = 2 minute
    init-retry-delay = 5 seconds
    inventory-stream-timeout = 15 seconds
    inventory-request-delay = 3 seconds
    inventory-timestamp-threshold = 10 minutes
    crawling-parallelism = 100
    max-attempt-count = 24
    lost-data-processing-delay = 10 minutes
    network-stream-buffer-size = 10
  }

  inventory-handler {
    max-buffer-time = 500ms
    max-buffer-size = 100
    max-cache-size = 100000
    expiration-time = 5m
    replier-parallelism = 10
  }

  cache {
    max-size = 100
    expire-after = 10m
  }

  storage {
    vendor = postgres
    schema = "public"
    migration-dir = "db/migration"
    profile = "slick.jdbc.PostgresProfile$"
    upload-chunk-size = 1MiB
    jdbc-config {
      url = "jdbc:postgresql://postgres:5432/node-1"
      driver = "org.postgresql.Driver"
      user = postgres
      password = wenterprise
      connectionPool = HikariCP
      connectionTimeout = 5000
      connectionTestQuery = "SELECT 1"
      queueSize = 10000
      numThreads = 20
    }
  }

  service {
    request-buffer-size = 10MiB
    meta-data-accumulation-timeout = 3s
  }
}

Choosing the database

Before changing the node configuration file, decide on the database that you plan to use to store confidential data. The Waves Enterprise blockchain platform supports interaction with PostgreSQL database or Amazon S3.

PostgreSQL

During the installation of a database running under PostgreSQL, you will create an account to access the database. The username and password you set for this account must then be specified in the node configuration file (in the user and password fields of the storage block of the privacy section, see the vendor = postgres section for details).

To use PostgreSQL DBMS, you will need to install the JDBC interface (Java DataBase Connectivity). When installing JDBC, set the profile name. This name must then be specified in the node configuration file (in the profile field of the storage block of the privacy section, see the vendor = postgres section for details).

For optimization purposes, connection to PostgreSQL can be done through the pgBouncer tool. In this case, pgBouncer requires special configuration, which is described below in the storage-pgBouncer section.

Amazon S3

When using Amazon S3, the information must be stored on the Minio server. During the Minio server installation, you will be prompted for a login and password to access the data. These login and password must then be specified in the node configuration file (in the access-key-id and secret-access-key fields, see vendor = s3 section for details).

After installing the DBMS appropriate for your project, adjust the storage block of the privacy section in the node configuration file as specified below.

storage block

Specify the DBMS you are using in the vendor parameter in the storage block of the privacy section:

  • postgres – for PostgreSQL;

  • s3 – for Amazon S3.

Important

If you do not use the privacy API methods, specify none in the vendor parameter and comment out or delete the rest of the parameters in the privacy section.

vendor = postgres

When using the PostgreSQL DBMS, the storage block of the privacy section looks like this:

storage {
  vendor = postgres
  schema = "public"
  migration-dir = "db/migration"
  profile = "slick.jdbc.PostgresProfile$"
  upload-chunk-size = 1MiB
  jdbc-config {
    url = "jdbc:postgresql://postgres:5432/node-1"
    driver = "org.postgresql.Driver"
    user = postgres
    password = wenterprise
    connectionPool = HikariCP
    connectionTimeout = 5000
    connectionTestQuery = "SELECT 1"
    queueSize = 10000
    numThreads = 20
   }
}

The block must contain the following parameters:

  • schema – the used scheme of interaction between elements within the database. By default, the public scheme is used, but if your database provides another scheme, specify its name;

  • migration-dir – directory for data migration;

  • profile – profile name for JDBC access, set during JDBC installation (see the PostgreSQL section);

  • upload-chunk-size – the size of the data fragment uploaded using POST /privacy/sendLargeData REST API method or SendLargeData gRPC API method;

  • url – the PostgreSQL database address (see the url field section for details);

  • driver – the name of the JDBC driver that allows Java applications to communicate with the database;

  • user – user name to access the database; specify the login of the account you created to access the database under PostgreSQL;

  • password – the password to access the database; specify the password of the account you created to access the database under PostgreSQL;

  • connectionPool – the connection pool name, HikariCP by default;

  • connectionTimeout – time of connection inactivity before it is broken (in milliseconds);

  • connectionTestQuery – a test query to test the connection to the database; for PostgreSQL, it is recommended to send SELECT 1;

  • queueSize – the size of the query queue;

  • numThreads – the number of simultaneous connections to the database.

url field

In the url field, specify the address of the database you are using in the following format:

jdbc:postgresql://<POSTGRES_ADDRESS>:<POSTGRES_PORT>/<POSTGRES_DB>

, where

  • POSTGRES_ADDRESS – PostgreSQL host address;

  • POSTGRES_PORT – PostgreSQL host port number;

  • POSTGRES_DB – the PostgreSQL database name.

You can specify the database address along with the account data using the user and password parameters:

privacy {
  storage {
    ...
    url = "jdbc:postgresql://yourpostgres.com:5432/privacy_node_0?user=user_privacy_node_0@company&password=7nZL7Jr41qOWUHz5qKdypA&sslmode=require"
    ...
    }
}

In this example, user_privacy_node_0@company is the username, 7nZL7Jr41qOWUHz5qKdypA is its password. You can also use the sslmode=require command to require ssl usage when authorizing.

pgBouncer

To optimize work with the PostgreSQL database you can use pgBouncer – the tool to connect to the PostgreSQL database. pgBouncer is configured in a separate configuration file – pgbouncer.ini. Because pool_mode = transaction mode in pgBouncer configuration does not support prepared server-side statements, we recommend to use pool_mode with session mode in pgbouncer.ini settings file to prevent data loss. When using session mode you should set the server_reset_query parameter to DISCARD ALL.

[pgbouncer]
pool_mode = session
server_reset_query = DISCARD ALL

More information about how session mode with prepared operators works can be found in the official documentation for pgBouncer.

vendor = s3

When using the Amazon S3 DBMS, the storage block of the privacy section looks like this:

storage {
  vendor = s3
  url = "http://localhost:9000/"
  bucket = "privacy"
  region = "aws-global"
  access-key-id = "minio"
  secret-access-key = "minio123"
  path-style-access-enabled = true
  connection-timeout = 30s
  connection-acquisition-timeout = 10s
  max-concurrency = 200
  read-timeout = 0s
  upload-chunk-size = 5MiB
}
  • url – address of the Minio server to store data; by default, Minio uses the 9000 port;

  • bucket – name of the S3 database table to store data;

  • region – name of the S3 region, the parameter value is aws-global;

  • access-key-id – identifier of the data access key; specify the data access login that you set during the Minio server installation (see Amazon S3);

  • secret-access-key – data access key in the S3 repository; specify the data access password that you set during the Minio server installation (see Amazon S3);

  • path-style-access-enabled = true – the path to S3 table; unchangeable parameter;

  • connection-timeout – period of inactivity before the connection is broken (in seconds);

  • connection-acquisition-timeout – period of inactivity when establishing a connection (in seconds);

  • max-concurrency – the maximum number of concurrent accesses to the storage;

  • read-timeout – period of inactivity when reading data (in seconds);

  • upload-chunk-size – the size of the data fragment uploaded using POST /privacy/sendLargeData REST API method or SendLargeData gRPC API method.

replier block

Use the replier block in the privacy section to specify confidential data streaming parameters:

replier {
  parallelism = 10
  stream-timeout = 1 minute
  stream-chunk-size = 1MiB
}

The block must contain the following parameters:

  • parallelism – the maximum number of parallel tasks for processing privacy data requests;

  • stream-timeout – the maximum time the read operation on the stream should perform;

  • stream-chunk-size – the size of one partition when transferring data as a stream.

inventory-handler block

Use the inventory-handler block in the privacy section to specify policies inventory data aggregation parameters:

inventory-handler {
  max-buffer-time = 500ms
  max-buffer-size = 100
  max-cache-size = 100000
  expiration-time = 5m
  replier-parallelism = 10
}

The block must contain the following parameters:

  • max-buffer-time – the maximum time for buffer; when the specified time elapses, the node processes all inventories in batch;

  • max-buffer-size – the maximum number of inventories in buffer; when the limit is reached, the node processes all inventories in batch;

  • max-cache-size – the maximum size of inventories cache; using this cache the node selects only new inventories;

  • expiration-time – expiration time for cache items (inventories);

  • replier-parallelism – the maximum parallel tasks for processing inventory requests.

cache block

Use the cache block in the privacy section to specify policy data responses cache parameters:

cache {
  max-size = 100
  expire-after = 10m
}

Note

Large files (files uploaded using POST /privacy/sendLargeData REST API method or SendLargeData gRPC API method) are not cached.

The block must contain the following cache parameters:

  • max-size – the maximum count of elements;

  • expire-after – the time to expire for element if it hasn’t got access during this time.

synchronizer block

Use the synchronizer block in the privacy section to specify private data synchronization parameters:

synchronizer {
  request-timeout = 2 minute
  init-retry-delay = 5 seconds
  inventory-stream-timeout = 15 seconds
  inventory-request-delay = 3 seconds
  inventory-timestamp-threshold = 10 minutes
  crawling-parallelism = 100
  max-attempt-count = 24
  lost-data-processing-delay = 10 minutes
  network-stream-buffer-size = 10
}

The block must contain the following parameters:

  • request-timeout – maximum response waiting time after a data request; the default value is 2 minute;

  • init-retry-delay – first delay after an unsuccessful attempt; with each attempt, the delay increases by 4/3; the default value is 5 seconds;

  • inventory-stream-timeout – the maximum time the node waits for a network message with the inventory information, i.e. confirmation from a particular node that it has certain data and can provide it for downloading. When this timeout expires, the node sends inventory-request to all the peers to see if they have the necessary data for downloading; the default value is 15 seconds;

  • inventory-request-delay – delay after requesting peers data inventory (inventory-request); the default value is – 3 seconds;

  • inventory-timestamp-threshold – time threshold for inventory broadcast; inventory broadcast is used for new transactions to speed up the privacy subsystem; the parameter is used to decide whether to send PrivacyInventory message when the data is synchronized (downloaded) successfully; the default value is 10 minutes`;

  • crawling-parallelism – the maximum parallel crawling tasks count; the default value is 100;

  • max-attempt-count – the number of attempts that the crawler will take before the data is marked as lost; the default value is 24;

  • lost-data-processing-delay – the delay between the attempts to process the lost items queue; the default value is 10 minutes;

  • network-stream-buffer-size – the maximum count of the data chunks in the buffer; when the limit is reached, back pressure is activated; the default value is 10.

inventory-timestamp-threshold field

A node sends a PrivacyInventory message to peers after it has inserted data into its private storage by a certain data hash. A cache is used to store the PrivacyInventory, which is limited by the number of objects and their time in the cache. Depending on the value of the inventory-timestamp-threshold parameter, the data insertion event handler decides whether the PrivacyInventory message should be sent when the data is inserted. The handler compares the transaction timestamp, which corresponds to the given data hash, and the current time on the node. If the difference exceeds the value of the inventory-timestamp-threshold parameter, the PrivacyInventory messages are not sent. By adjusting the value of the inventory-timestamp-threshold parameter, you can avoid the situation where a node which synchronizes the state with the network clogs the network with unnecessary PrivacyInventory messages.

service block

In the service block of the privacy section, specify the SendLargeData gRPC method and POST /privacy/sendLargeData REST method parameters to send a stream of confidential data.

service {
  request-buffer-size = 10MiB
  meta-data-accumulation-timeout = 3s
}

The block must contain the following parameters:

  • request-buffer-size – the maximum request buffer size; when the specified size is reached, the back pressure is activated;

  • meta-data-accumulation-timeout – the maximum time of metadata entity accumulation when sending data via POST /privacy/sendLargeData REST API method.

See also