You are viewing an older version of this section. View current production version.
Join the MemSQL Community Today
Get expert advice, develop skills, and connect with others.

Filters

A filter file can be used to replicate only a specified subset of source data.

Use --filter <filter_file> to specify the desired filter file. If this parameter is not specified, all source data that the user can access at the source is replicated to the destination.

Example:

./bin/replicate snapshot conf/conn/oracle.yaml conf/conn/memsql.yaml --filter filter/source_filter.yaml

The file is specified in YAML format using key-value pairs. The following describes the YAML keys and values.

  • allow: Use this to specify the database, collections, and documents to replicate in snapshot mode.
    • - catalog: Specify the source catalog/database to replicate.
    • schema: Specify the source database schema to replicate. Each schema must have a separate entry.
    • types: [TABLE]
    • allow:
      • <table_name>: Specify the table name. Each table within the database must be a separate entry.
        • allow: The list of columns in this table that will be replicated. If not specified, all columns will be replicated.
        • conditions: The predicate to use for filtering the data while it’s extracted from the source. You may specify the exact SQL predicate that Replicate will use while extracting data.

Typically, for databases that support both catalog and schema, both must be specified for each schema.

Each rule specifies which object types to replicate. Currently, only TABLE objects can be replicated, while VIEW is currently not supported.

Each rule can either allow or block certain tables. Allowed tables must be specified as a map, whereas blocked tables must be specified as a list. If no tables have been explicitly allowed or blocked as a part of a specific rule, all tables in the corresponding catalog/schema are implicitly allowed.

Use the unlisted key to define behavior for tables that are neither explicitly allowed nor blocked. It can have two values: BLOCK or ALLOW. By default, this value is set to BLOCK, and can be changed to ALLOW to allow blocklist filtering.

For any allowed table, it’s possible to either allow or block specific columns. However, it’s not possible to simultaneously allow some columns while blocking others as it’s unclear how to handle columns that were neither explicitly allowed nor blocked.

File Format

# Multiple rules can be defined for different catalogs/schemas
allow:
# Catalog or schema can be omitted if not supported by the database
- catalog: <catalog_name>
  schema: <schema_name>
  # Object types that will be eligible for replication
  types: [TABLE, VIEW]
  # Tables that will be eligible for replication
  allow:
	<table_name>:
  	# Columns that will be eligible for replication
  	allow: [<column_name>, <column_name>, <column_name>]
	<table_name>:
  	# Columns that will NOT be eligible for replication
  	block: [<column_name>, <column_name>]
  # Tables that will NOT be eligible for replication
  block: [<table_name>, <table_name>]
  # Rule for all tables not explicitly allowed or blocked
  unlisted: BLOCK
- catalog: <catalog_name>
  schema: <schema_name>
  types: [TABLE, VIEW]
  allow:
	<table_name>:
  	allow: [<column_name>, <column_name>, <column_name>]
	<table_name>:
  	block: [<column_name>, <column_name>]
  block: [<table_name>, <table_name>]
  unlisted: BLOCK

Examples

General

allow:
- catalog: "io"
  schema: "test-org"
  types: [TABLE]
  allow:
    "users":
      allow: ["first_name", "last_name"]
    "roles":
      block: ["admin"]
  block: ["offices"]
  unlisted: ALLOW

In the above example:

  • Only the io.test-org schema is considered and only TABLE objects are replicated.
  • From the users table, only the first_name and last_name columns are replicated. Concurrently, all columns from the roles table are replicated, with the exception of the admin column.
  • The offices table is explicitly blocked and is excluded by the filter. Since unlisted is set to ALLOW, all tables that are neither explicitly allowed nor blocked are replicated.

Oracle

allow:
- schema : "tpch"
  types: [TABLE]
#  allow:
#	nation :
#	region :
#	part :
#	upplier :
#	partsupp :
#	orders :
#  	  allow: ["ORDERKEY", "COMMENTS", "LAST_UPDATE_TIME"]
#	customer:   	 
#	lineitem:
# 	lineitem1:
#  	  conditions: "orderkey > 1000"

Note: For Oracle, the schema key is used, while the catalog key is omitted.

MemSQL

allow:
- catalog : "tpch"
  types: [TABLE]
#  allow:
#	nation :
#	region :
#	MemSQL_part :
#	supplier :
#	partsupp :
#	MemSQL_orders :
#  	allow: ["orderkey", "comments", "last_update_time"]
#	parts_view:
#	partsupp_macro:
#	orders:
#  	conditions: "o_orderkey < 5000"

Note: For MemSQL, the catalog key is used, while the schema is omitted.