Salesforce BULK API in Talend Data Integration

Migrating or integrating your data from an in-house application or cloud application to Salesforce can be difficult, time consuming and consume a lot of API calls if you choose the wrong approach.
There are a lot of tools available on the Internet nowadays, and the most common tool that a developer is like to use is the Salesforce Apex Data Loader. The advantage of using this tool is that it does support the BULK API, which can help to save the number of API calls that you need during the upload. However, if you want to implement additional logic to manipulate the data before uploading the data to Salesforce, you’ll need to consider to using an integration tool.
Here, I will go through the steps to implement a simple Talend Data Integration job which will upload data from a CSV file to the Salesforce Account object by utilizing the Salesforce Bulk API.
First, you need to create a new job in Talend. Here, I will start with tPrejob component and connect it to a tFileInputDelimited component. This will force the job to execute the tFileInputDelimited component during runtime to read the data that I want to load into Salesforce.
Figure 1 – Read data using tFileInputDelimited
Next, you have to specify the schema for the tFileInputDelimited component according to the fields that you have in the CSV file. Below is the schema that I use in this example:
Figure 2 – Schema for tFileInputDelimited
Now, you have to drag the tSalesforceOutputBulk component into the design workspace and specify the location to save the Salesforce bulk data load file and schema. Please note that the name of the field for the tSalesforceOutputBulk component must be exactly the same as the API name in that you see in Salesforce Account object (Setup -> App Setup -> Customize -> Accounts -> Fields).
Figure 3 – Salesforce bulk data load file location for tSalesforceOutputBulk component
Figure 4 – Schema for tSalesforceOutputBulk component
Once the schema for tFileInputDelimited and tSalesforceOutputBulk are specified, we will do a simple transformation in between and mapping by using tMap component. You need to drag the tMap component from the palette and:
1. connect the tFileInputDelimited to tMap by right click on the component -> Row -> Main
2. connect the tMap to tSalesforceOutputBulk by right click on the component -> Row -> *New Output* (Main) -> name the output (in this example, I name it as sf_data) then click Yes when it prompts you “Do you want to get the schema of the target component?”.
Figure 5 will be the current flow that I have in design workspace.
Figure 5 – Flows from tFileInputDelimited to tSalesforceOutputBulk
In the tMap component, you can map the fields and apply additional logic according to your business logic. In this example, I want to join the address1 and address2 to become BillingStreet in Salesforce
Figure 6 – Field mapping in tMap component
At this point, you have done first part of the process and we will move on to the second part where the job will read the Salesforce bulk data load file using the tSalesforceBulkExec component and save the success and failure result to CSV files (salesforce_account_bulk_success.csv and salesforce_account_bulk_fail.csv).
Figure 7 – tSalesforceBulkExec
Before you move on to the next step, you need to configure the connection settings and the number of rows to commit in the tSalesforceBulkExec component. The default Rows to commit is 10000. You can reduce the number according to your requirement. I will stick with 10000 in this example as my company has data to load in a day and this can help to save the number of API calls 🙂
Figure 8 – tSalesforceBulkExec connection setting
Figure 9 – Rows to commit in tSalesforceBulkExec
After that, you need to connect the Main row from tSalesforceBulkExec component to a tFileOutputDelimited to record the success record and the Reject row to another tFileOutputDelimited to record the failure record. The reason of doing this is you will be able to know which record is uploaded to Salesforce successfully and the record id in Salesforce. This makes your life easier if you would want to use the record id in another job. Below are the schemas that you should see in Main row and Reject row:
Figure 10 – Schema for Main row from tSalesforceBulkExec
Figure 11 – Schema for Reject row from tSalesforceBulkExec
Yes, now you are done and you should something similar to this in the design workspace. You can run the job by clicking on the Run button and you should see the data is uploaded to Salesforce.
Figure 12 – Complete flow