Overview of Data Migration features
High level summary of the Data Migration features
Setting up and managing projects
-
Admins and Editors can create new projects.
- Project names must be unique.
- Users are added to a project as a 'member', what they can do in a project depends on their role and any specific project restrictions applied.
- Only Admins can delete a project.
- Projects maybe duplicated.
- Projects can be given a status to Active, Complete, or Archived.
Adding files and assets
- Files can be added to a project by:
- Using the in project Upload Files feature (small amounts and under 1gb).
- Added a connected SFTP service to the platform (speak to customer success) and then bulk importing them using the SFPT utilities.
- Files are used in the creation of project assets and as data sources for conversions.
- Files can be deleted by users with appropriate permissions.
- Files can be organized into folders and renamed.
- Files must be categorized (Schema, Data, Config, Unknown), which is used to help restrict the use of them within Zengines.
- CSVs are the preferred file type and format for Schema and Data
- Files that are categorized as Data can be previewed in the platform depending on the user role and project restrictions.
Working with Sources and Targets
Defining Sources and Targets
-
Sources and Targets contain the schema, related meta data and associated data, they are defined by either:
- Creating using a schema definition file which can contain multiple tables and has data type definitions as well and other constraints and metadata.
- Using a data file that represents a single table, metadata is inferred and should be reviewed before use.
- Source and Target must have unique and SQL compliant names.
- Source and Target schema can be edited:
- Tables and fields can be modified .
- Tables and Fields can be added.
- Tables can be duplicated.
-
- Tables and fields can be deleted depending on the user's role and specific project restrictions.
- Batch updates can be made (excluding delete actions).
- Sources and Targets have 2 statuses: Draft and Ready,
- In Draft status, they will not be used in Zengines matching processes (although they can be manually set in Mapping).
- In Ready status they will be include in matching processes and conversion jobs.
Associating Data with a Source or Target
- Data files can be associated with a table, as long as the Data file headers (column names) match the table field names.
- Files paired with a table can optionally be set as example data to display on in the schema/table definition. Example data is a random sample of 10 values extracted from the Data file.
- Associated data files can have their use restricted, so that no part of their content is sent to LLMs.
- Associated data files can be profiled and a report generated (statistics for value per field).
- Associated data files can be validated for conformity against the schema (data types, constraints).
- Associated data files appear as the default selectable items for data sources in Conversion jobs.
- A table can have many associated data files.
- A data file can only be associated with 1 table.
Managing Sources and Targets
-
Sources and targets can be duplicated
- Sources and targets can be deleted depending on the user's role and specific project restrictions.
Mapping
- All mappings (how the source data is used for the target) are directly link to a Target field.
- A Target field has a 'mapping status' to help manage workflow and testing. A field can be set by a user to:
- Not started (no assigned Sources, no action taken on it)
- In progress (is automatically set when a Source is assigned)
- Testing (user can set)
- Complete (user can set, this triggers a validation check)
- Blocked (user can set)
- A mapping can be Locked/unlocked to prevent any editing, depending on the user's role and specific project restrictions.
- A mapping can have a simple text based explanation added to it so that complex transformation rules can be easily understand by non technical observers.
- Mappings can be reset (have assigned Sources removes and and custom transformations cleared).
- When matches are generated for a Target Zengines analyzes all Sources set to 'ready', scores the individual fields and surfaces star rated recommendations for the best Source field matches.
- Assigning a Source field to a Target field is done in the 'Select Sources' view of a selected field on the Mapping grid, and source field can be used, whether recommended or not.
- A single assigned Source field can be 'auto-transformed' (this will simply try to ensure any Target field data type constraints are met). A 'custom transformation' can be added to override this.
- Multiple Source fields from different Sources and their tables can be assigned to a Target field, this will require a 'custom transformation' to use in conversions.
- It is not necessary to use all assigned Source fields, although assigned Source fields are used to inform JOIN statement requirements when defining a Conversion Job.
Custom transformations
-
Using the Zengines Transformation Language, custom SQL based rules can be written to manipulate the assigned Source fields so that the output value satisfies the Target field requirements.
- All rules can be tested using the example data set on the Source fields, or directly input into the example data area.
- Any custom transformations must be 'valid' to be set the mapping to 'complete'.
- The Zengines AI 'Rule generator' can be used to created the code for SQL transformation rule.
Conversion
- Conversions are managed through the creation of Conversion Jobs, a job consists of the following:
- A Target, and a selected set of its tables to generate the data for in this job.
- A set of JOIN and optional 'data filters' defining the row level data relationships across the Source tables used to populate the Target (the Sources assigned in the Mappings).
- A set of data files to use for each of the Source tables used in the Mappings.
- Conversion jobs can be modified and deleted, depending on the user's role and specific project restrictions.
- Conversion jobs can be run multiple times, each run contains the following for each Target table it includes:
- A data file named the same as the target table.
- A process log txt file explaining the what was done.
- If an error occurs, an error log txt file.