A filter file can be used to replicate only a specified subset of source data.
--filter <filter_file> to specify the desired filter file. If this parameter is not specified, all source data that the user can access at the source is replicated to the destination.
./bin/replicate snapshot conf/conn/oracle.yaml conf/conn/memsql.yaml --filter filter/source_filter.yaml
The file is specified in YAML format using key-value pairs. The following describes the YAML keys and values.
allow: Use this to specify the database, collections, and documents to replicate in snapshot mode.
- catalog: Specify the source catalog/database to replicate.
schema: Specify the source database schema to replicate. Each schema must have a separate entry.
<table_name>: Specify the table name. Each table within the database must be a separate entry.
allow: The list of columns in this table that will be replicated. If not specified, all columns will be replicated.
conditions: The predicate to use for filtering the data while it’s extracted from the source. You may specify the exact SQL predicate that Replicate will use while extracting data.
Typically, for databases that support both catalog and schema, both must be specified for each schema.
Each rule specifies which object types to replicate. Currently, only
TABLE objects can be replicated, while
VIEW is currently not supported.
Each rule can either allow or block certain tables. Allowed tables must be specified as a map, whereas blocked tables must be specified as a list. If no tables have been explicitly allowed or blocked as a part of a specific rule, all tables in the corresponding catalog/schema are implicitly allowed.
unlisted key to define behavior for tables that are neither explicitly allowed nor blocked. It can have two values:
ALLOW. By default, this value is set to
BLOCK, and can be changed to
ALLOW to allow blocklist filtering.
For any allowed table, it’s possible to either allow or block specific columns. However, it’s not possible to simultaneously allow some columns while blocking others as it’s unclear how to handle columns that were neither explicitly allowed nor blocked.
# Multiple rules can be defined for different catalogs/schemas allow: # Catalog or schema can be omitted if not supported by the database - catalog: <catalog_name> schema: <schema_name> # Object types that will be eligible for replication types: [TABLE, VIEW] # Tables that will be eligible for replication allow: <table_name>: # Columns that will be eligible for replication allow: [<column_name>, <column_name>, <column_name>] <table_name>: # Columns that will NOT be eligible for replication block: [<column_name>, <column_name>] # Tables that will NOT be eligible for replication block: [<table_name>, <table_name>] # Rule for all tables not explicitly allowed or blocked unlisted: BLOCK - catalog: <catalog_name> schema: <schema_name> types: [TABLE, VIEW] allow: <table_name>: allow: [<column_name>, <column_name>, <column_name>] <table_name>: block: [<column_name>, <column_name>] block: [<table_name>, <table_name>] unlisted: BLOCK
allow: - catalog: "io" schema: "test-org" types: [TABLE] allow: "users": allow: ["first_name", "last_name"] "roles": block: ["admin"] block: ["offices"] unlisted: ALLOW
In the above example:
- Only the
io.test-orgschema is considered and only
TABLEobjects are replicated.
- From the
userstable, only the
last_namecolumns are replicated. Concurrently, all columns from the
rolestable are replicated, with the exception of the
officestable is explicitly blocked and is excluded by the filter. Since
unlistedis set to
ALLOW, all tables that are neither explicitly allowed nor blocked are replicated.
allow: - schema : "tpch" types: [TABLE] # allow: # nation : # region : # part : # upplier : # partsupp : # orders : # allow: ["ORDERKEY", "COMMENTS", "LAST_UPDATE_TIME"] # customer: # lineitem: # lineitem1: # conditions: "orderkey > 1000"
Note: For Oracle, the
schema key is used, while the
catalog key is omitted.
allow: - catalog : "tpch" types: [TABLE] # allow: # nation : # region : # MemSQL_part : # supplier : # partsupp : # MemSQL_orders : # allow: ["orderkey", "comments", "last_update_time"] # parts_view: # partsupp_macro: # orders: # conditions: "o_orderkey < 5000"
Note: For MemSQL, the
catalog key is used, while the
schema is omitted.