Tips & Tricks
Improving Throughput with Multiprocessing in SSIS
By Oscar Li,
Product Channel Manager
Did you know you can multiprocess your records by initiating multiple instances of our components? This guide will help you improve your throughput with our on-premise processing mode using SSIS.
Web services may be multithreaded already; using our advanced configuration to set the number of threads. Our on-premise components (when the processing mode is set to On-Premise) will process the procedures one at a time. However, in SSIS, it’s very easy to set up parallelism in order to scale up and increase processing speed.
- First, we will need to have a way to incrementally assign a unique sequential integer for every record. You have two options, you may use a unique ID that is automatically generated by SQL (you can import that into use), or you can use the script component to generate a unique number on the fly. Script component is a native component of the SSIS toolbox.
In this example, we will be using a script component to add a column to each record containing a unique incremental ID.
Within the “Script Component,” I create a new output column called RowCountID which stores a unique ID for each record. And then I script it this way:
- Next, we will need to setup a conditional split which is also an SSIS internal component. Use the Conditional Split transform to take the RowCountID and split the records into three groups using a modulo operator:
If you need to initialize more than 3, you simply need to increase the modulo and the corresponding number of Cases.
- Now, we set up MD components with all the mappings and settings. Once we are done, make 3 copies of the component by selecting the component (Ctrl C + Ctrl V) twice more. By copying the finished component, all settings and mappings should be copied as well. Next, map the condition split outputs into every component. The number of available outputs should match the number of components we have initialized.
We should see something like this:
- For the last step, take all the outputs from the components and funnel them into the Union All component (Native). This will recombine the records into one stream so they’re ready for their destination. An additional note, a solid state hard drive is beneficial for speed and will increase performance as our objects (address/phone/email/name) refer to data files that are installed in your Program Files\Melissa Data folder. This will ensure that our components do not wait on receiving records in order to process.
I have three instances of Contact Verify set to run in parallel, thus potentially tripling the processing speed. What’s happening here is that I use a Script Component to generate an increasing, unique ID for each record and store it into a variable called RowCountID. Then, following that, I use the Conditional Split transform to take the RowCountID and split the records into three groups using a modulo operator. This will allow 3 streams for our contact verify components to process in parallel. Once processing completes, we take all 3 streams back into one with the union all component and from there we enter it into our destination.
- Is parallel processing really a linear multiplier reduction in overall time? – Well, not really a linear multiplier, it doesn’t quite scale like that. Potentially, but a better word would be roughly. As you increase the number of instances of our components in parallel, the law of diminishing returns kicks in. Eventually, hard drive and CPU speed will become bottlenecks.
- We recommend initializing the same number of MD components as you have CPU cores. Eventually, if you initialize too many instances (diminishing returns) – the operating system will begin scheduling as there won’t be enough cores to handle multiprocessing.
- Try to have adequate RAM as every instance initialized will use RAM memory.
- Use SSD or RAID hard drive arrays to improve I/O speeds to reduce I/O type of bottlenecks.
- We recommend having more “Engine Threads” on the project properties sidebar than the number of active components in your project.