Data Stores & Artifacts
Note
The goal of this page is to help you understand all the data artifacts consumed and produced by dfuse, the different databases, the different types of storages.Stores
There are 2 data stores used by the dfuse Platform:
- Object stores, for small or large files. These use the dfuse dstore abstraction library to support Azure, GCP, AWS, Minio, and local filesystems.
- Simple key/value storage databases. These use the kvdb key/value database abstraction, with support for Google Cloud Bigtable, TiKV and Badger.
Databases
Different databases are needed for different components of dfuse for EOSIO
:
trxdb
: akvdb
-backed key/value store, that stores block and transaction information, pre-sorted, and easily searchable by prefix or key ranges.
- This database can be written to in any order, segments by segments.
- For a 2-year old, EOS Mainnet-like deployment, this database will easily reach dozens of terabytes. Therefore, choose your backend accordingly.
- The transactions stored in this database can be filtered (docs) to save on storage.
- The
dfuseeos tools
command has tools to verify the integrity of such a database, to ensure a contiguous block history. - It is written to by the trxdb-loader component.
- See parallel processing docs for more info on parallel ingestion.
statedb
: akvdb
-backed key/value store, that stores state changes of the blockchain state (internal, as well as contract state), like tables, rows, and snapshots of such tables. It uses purpose-built algorithms to enable snapshotting of all tables, at all block heights.
- This database needs to be written to linearly, from a perspective of a single table.
- It is written to by the statedb injector component.
- See parallel processing docs for more info on parallel ingestion.
- The
search
components require a smalletcd
cluster (typically 3 nodes) for service discovery betweensearch
components. See more details aboutsearch-etcd
Caches
An optional memcached
server will help a search
cluster increase performances for larger deployments.
It stores small (< 1kb) roaring bitmaps as values, with hashed normalized queries as keys.
See the search-memcached component in the documentation.
Artifacts
nodeos
These are artifacts managed by nodeos
itself. dfuse for EOSIO
knows how to manipulate them through APIs on the node-manager or mindreader process.
blocks/blocks.log
This file is an append-only file. Things are written sequencially in there, only when they become irreversible. A second small database alongside the blocks
folder, called reversible
.
On small low traffic networks, this will be quite tiny, a few megabytes
A small index file called blocks/blocks.index
is also append only, and stores fixed-sized pointers to offsets in blocks.log
. These two files combined, allows nodeos
to quickly fetch unexecuted block data to serve them on the p2p network upon request.
state/shared_memory.bin
This is a mmap’d file, and stores the current live state of the blockchain, both regarding block headers, and regarding all contract’s data. This is managed by the chain-state-db-size-mb
(and chain-state-db-guard-size-mb
), and must be larger than what the underlying chain configuration will allow to allocate. If this is smaller, and transactions on the chain ask to write to contract storage, the node will cleanly shutdown (that’s where the guard
comes in).
This file is also a sparse file, meaning it will be allocated the full amount of what is set in chain-state-db-size-mb
, but might actually occupy far less space on the disk.
For example, after 2 years, EOS Mainnet offers more than 72GB of RAM on the RAM market, but only ~8GB are actually used by participating nodeos
nodes. The rest of the file is filled with sparse zeroes.
See the backup and recovery section for efficient ways to backup/recover those files.
portable state snapshots
These files are generated by nodeos
upon request, either through the command-line or through the node-manager
APIs.
They contains all the data in the state/shared_memory.bin
for nodeos
, in a binary format that is portable, versioned and stable. It can then be used to boot a new nodeos
instance, and fill a state/shared_memory.bin
.
dfuse artifacts
These are dfuse-specific artifacts.
In general, the dfuse Platform uses Protocol Buffers version 3 for serialization.
executed merged blocks files
Also called 100-blocks files
, or merged blocks files, or merged bundles. These are all used interchangeably here.
These files are binary files that use the dbin packing format, to store a series of bstream.Block
objects (defined here), serialized as Protocol Buffers version 3.
They are produced by mindreader
, in catch-up mode (set as such with certain flags), or by the merger
in an HA setup. In the latter case, the mindreader
contributes one-block files to the merger instead, and the merger collates all of those in a single bundle.
These 100-blocks files can contain more than 100 blocks (because they can include multiple versions of a given block number), but not less (to ensure continuity).
They are consumed by the bstream library, used by almost all components.
The EOSIO-specific decoded Block objects are what circulate amongst all processes that work with executed block data.
one-block files
These are transient files, destined to ensure that the merger
gathers all visible forks from the mindreader
instances, in an HA setup.
They contain one bstream.Block
, as serialized Protobuf (see links above).
The merger
will consume them, bundle them in executed blocks files (100-blocks files) and store them to dstore
storage, for consumption by most other processes.
search indexes
These are tarred (.tar
) and ZStandard-compressed (.zstd
) archive of Bleve indexes.
They are produced by the search-indexer
process, and consumed by the search-archive
nodes.
They contain pointers to what is stored in the trxdb
key/value store, and looked up by a transaction ID prefix.
They do not contain the actual transaction data, only the indexes to allow for fast search. They also only contain search terms specified to the indexer
, live
and forkresolver
components of search
.
abicodec ABI cache
The abicodec
component primes a local cache of all the ABI changes throughout the history of the chain. It feeds itself off of a dfuse Search endpoint. Once it has that local cache, it stores it to a dstore
location, to start faster next time.
At the time of writing, this file is an opaque binary-packed format, that only abicodec can read and write.