We at trueNorth recently helped our client with their data migration from a legacy system to a Dynamics CRM solution.
The main requirement for the business was the dataload had to complete in 1 weekend, this included:
- Preparation
- The migration
- Validation and Testing
We also had to leave enough time to revert back to the existing database if needed, so.. when we say a weekend we really mean 30 hours or so.
Many “off the self” products were tried but none met the throughput needs of the customer. The products had powerful features in terms of data transformation but we found using these features would reduce the throughput of the migration process.
After many discussions around optimization and possibly changing the roll out plan to remove the weekend restriction, it was agreed that we needed to try to code our own migration product.
For this, we had the following design goals:
- Any field type must be supported, e.g. lookups, optionsets, relationships
- Highly optimised performance
- Must log all migrated data
- Must be able to write back to the source (allows us to GUID stamp the source ROW with the CRM ID)
- Must support updates
- Must be simple to use
Also, as we are competent developers we followed good TDD practices and even had the facility to control DM runs from TFS so it could be plumbed into a continues integration cycle.
We used a pipeline pattern and code factory could load in the pipeline component at each stage so a basic data-load would carry out the following:
The pipeline concept allows the flexibility to insert extra components as required. For example, a pipeline component to reopen Inactive records for updating, and closing them afterwards was implemented.
The headline performance figures for simple entity creation on a load balanced environment with 4 front end servers was over 120k records per minute.
The key optimisations to reach this throughput were:
- Optimal number of threads for the environment
- Increasing the maximum number of .Net sockets available
- ExecuteMultiple calls to batch updates and reduce SQL locks
- General CRM db optimisations
To keep the approach very simple, the source is taken from an intermediate SQL data-source, this allows the data to be transformed before it hits the dataloader allowing staff that understand the legacy schema to prepare the data. In our customers case, SSIS was used for this initial transformation however many other options are available.
If you can produce the data in SQL (or CSV) you can load it quickly.