Data Storage System

Objectives

The common data shared between participating data marketplace instances may include identity information, shared semantic models, meta-information about data sets and offerings, semantic queries, sample data, smart contract templates and instances, crypto tokens and payments. No single party should fully control the data storage system and there shall be no single point of failure.

The high-level capabilities that the data storage aims to provide are:

  1. Decentralised Storage
  2. Distributed Storage

The Decentralised storage shall provide highest available security guarantees in a federated network. The Decentralised storage subsystem will be built on a secure Byzantine fault tolerant consensus based distributed ledger. Due to high security requirements, the performance and storage space of such a system may be relatively limited compared to conventional databases.

The Distributed storage shall provide a database-like subsystem that is scalable, runs on a set of distributed nodes, has a rich query interface (SQL) and can handle large amounts of data.

Additionally, the data storage system provides an API to interact with the sub-components. In case of distributed storage, a custom API is implemented in order to provide ease of access to the features supported by the storage, however, the i3-MARKET shall rely on the API of the decentralised storage provided out-of-box.

Context

Figure 1. Data Storage System in i3-MARKET context shows the data storage system in the i3-MARKET context. All components that need to persist some global state or use global data for operation will interact with the data storage system. The data storage system needs do interface with the Semantic engine system and the Trust, security and privacy system for access management. However, the interaction between the Data storage system and the Semantic engine system is not yet finalized and is subject to change during the course of implementation.

Figure 1 – Data Storage System in i3-MARKET context

Building blocks

The Storage system consists of two main subsystems for implementing the decentralised storage and distributed storage features, respectively. The subsystems are, at least in the initial architecture, relatively independent of other systems and also with each other.

The diagram of Decentralised storage subsystem is shown in Figure 2. Decentralised Storage Subsystem. The Decentralised storage subsystem is implemented as a blockchain-based distributed ledger network. The software implementation is Hyperledger Besu in a permissioned setup using IBFT 2.0 consensus. Hyperledger Besu uses internally an embedded RocksDB instance for storing linked blocks (the journal of transaction) and world state (the ledger). Hyperledger Besu can instantiate and execute smart contracts for supporting the use cases of i3-MARKET framework.

Figure 2 – Decentralised Storage Subsystem

The components depending on the decentralised storage subsystem will use Hyperledger Besu’s native JSON-RPC-based interface. A separate interface layer for accessing (or limiting access to) decentralized storage is not planned, as the nodes of the decentralised storage will already validate all transactions submitted to the ledger.

The diagram of Distributed storage subsystem is shown in Figure 3. Distributed Storage Subsystem. The subsystem consists of a distributed cluster of database nodes and an optional interface layer (not implemented for R1). The database provides an SQL interface to other i3-MARKET framework components. The software implementation database is CockroachDB that can be accessed via PostgreSQL-compatible wire protocol for which a large number of client libraries exist in different languages and platforms. Only secure access to the database will be enabled, hence all clients need to use private keys and valid certificates to access the database.

Figure 3 – Distributed Storage Subsystem

Security

Authentication and Authorization

The distributed storage component is an internal component with no external access. That is to say that it will have connections only with other trusted services within the i3-MARKET backplane. Even though this simplifies the necessary measures in terms of authentication and authorization, it is still needed to secure machine-to-machine connections between the i3-MARKET services, since they can be deployed on shared infrastructure.

Although the approach may be reconsidered in the near future, the current solution relies on providing the distributed storage behind a TLS server endpoint and requiring TLS client certificates for the different connecting services. The setup will guarantee end-to-end security between the distributed storage service and any of its client services.

The governance of the certificates has followed up to now the keep it simple approach. The Distributed Storage system will be in charge of issuing the servers’ and clients’ certificates. For release 2, the definition of the governance rules for issuing certificates within the i3-MARKET federation, will be considered.

Service Availability

The storage subsystem is a critical component of the i3-MARKET network contributing to the proper functioning of the platform. Hence, appropriate measures in the form of design, choice of technologies and deployment, have to be applied. Fortunately, the two main subsystems used in the storage solution already have strong built-in availability features that will be summarised below.

Distributed Storage

The distributed storage solution is based on a CockroachDB server cluster consisting of four nodes. All data is replicated to at least three nodes before a transaction is considered committed. Therefore, data will be available even in a catastrophic event when half of the cluster is destroyed.

In the current setup, the database cluster can continue with normal transaction processing when three nodes out of the four are available. This feature guarantees the availability of the cluster in case of, for example, regular maintenance and upgrades of server software. CockroachDB supports the addition of new nodes as needed to support the load the component is required to process.

The federated search engine index service uses the CockroachDB server cluster as its storage backend. For availability, multiple independent instances of the index can be deployed. The system is designed to have horizontal scalability with no shared state between the instances.

Decentralised Storage

The decentralised storage used in the platform is a Hyperledger BESU network which uses the IBFT 2.0 (Proof of Authority) consensus protocol. In this network, there are 4 validator nodes based on the genesis configuration stored in the corporative Nexus. In this configuration, there are 3 accounts to be used by the i3-MARKET federation.

In this scenario, different components like the auditable accounting, are capable to deploy and manage smart contracts and transactions over those accounts.

Technical Requirements

For Data Storage, the following high-level capabilities have been defined:

Decentralised Data Storage

NameDescription
Embedded Ledger DatabaseEmbedded Ledger Database is shared between operator nodes and keeps a shared state that is guaranteed to be the same at each honest node and is updated according to agreed rules.
Smart Contracts for
Permissioning
Smart contracts are programs that are instantiated from smart contract templates and stored in the distributed ledger along with their state.
Table 1 – High-level Capabilities of Decentralised Data Storage

The following table presents the user stories of the decentralised data storage capability.

NameDescription
Smart contractsAs a user I want to instantiate and invoke smart contracts to use the functionality of the system.
Smart contract template storageAs a developer I want to store smart contract templates for instantiation by users so that I can extend the functionalities provided by the platform.
DID Document StatusAs a user I want to register and update the status of my DID document so that I can manage my identity.
Consensus status managementAs a user I want to register and update my consent status so that I can control the use of my data.
BFT ConsensusAs a Data Marketplace I want the Decentralized Data Storage to use Byzantine Fault Tolerant consensus so that I can be sure a malicious party cannot compromise the data.
Data SecurityAs a stakeholder I want to have guarantees about ledger data integrity, availability, confidentiality so that I can rely on the services provided by the system.
Scalability of decentralized storageAs an operator of ledger databases, I want to be able to scale the storage to meet space and transaction rate demands in order to be able to run the ledger.
Table 2 – Decentralised Data Storage User Stories

Distributed Data Storage

NameDescription
Embedded Ledger DatabaseThe main database supporting consensus, sharding and permissioning.
SynchronizationThe distributed storage database must support data synchronization between nodes.
Semantic DatabaseSemantic database is a distributed database supporting the storage of semantic data and processing semantic queries.
API for External AccessThe API for External Access provides an interface for using the distributed storage to access and store metadata, verifiable claims, semantic data, semantic queries etc.
Table 3 – High-level Capabilities of Distributed Data Storage

The following table presents the user stories of the distributed data storage capability.

NameDescription
Semantic data availabilityAs a Data Marketplace I want to access semantic data so that I can process semantic queries.
Semantic data updatesAs a Data Marketplace I want to access semantic data so that I can process semantic queries.
Data Offering registrationAs a Data Provider I want to register my data offering at a Data Marketplace so that I can sell my data discovered via the offering.
Verifiable claim storageAs a user I want to use the distributed storage to store verifiable claims (including consents).
Offer query registrationAs a data consumer I want to register offer query so that the system can find the data I need.
Metadata updateAs a data provider I want to update the metadata of my data so that the metadata is up-to-date.
SLA template managementAs a stakeholder I want to store SLA templates so that other users could fetch the templates.
Scalability of distributed data storageAs a distributed data storage node operator I want to use sharding so that I can scale the storage.
Metadata storageAs a data provider, I want to store metadata of the dataset.
Table 4 – Distributed Data Storage User Stories

Go to the beginning of the page

Other resources of interest

Home

We are a community of experienced developers who understand that it is all about changing the perception of data economy via marketplaces support.

i3-MARKET Architecture

Take a look at the main building blocks and their hierarchy.

Data Access API

The secure Data Access API enables data providers secure registration…

Identity and Access Management

The SSI & IAM subsystem is in charge of providing both “User-centric Authentication” and…

Smart Contracts, Wallets & Accounting

i3-M Wallet is a set of technologies that facilitate the management of their identity to…

Crypto Token and Monetization System

The Data Monetization subsystem is in charge of providing “Standard Payments”…

Semantic Engine

We developed and implemented dedicated software components for Semantic Engine System as…

Specification Deployment

i3-MARKET architecture specification is based on the 4+1 architectural view model approach. One of…

Developers Quickstart SDK

Once a marketplace is part of i3-MARKET, it can issue credentials to its consumers, providers, and…