How can we help?

Swap supplements: update and insert large data sets

Overview

Updating hundreds of millions of records in a large supplement collection can be a time-intensive process and there are times when it is necessary to lock a collection in its current state until all of the new records have been inserted.

We can accomplish this goal using a set of Smarty utilities and the concept of supplement swapping. The idea behind this approach is to create a temporary supplement collection as a destination for loading data, and then swap temporary and production collections after all data are loaded. One of the primary benefits of using this method is that the production data can be queried while the temporary collection is being loaded. Additionally, the collection swap is near-instantaneous, allowing queries to reference new data immediately after the exchange.

Steps

When writing Smarty, we can think of this process as having three distinct steps:

  1. The pre-job script runs Smarty only once and is used to set variables and other data objects for use by the main transformation script. This is the default section for calling the createTmpSupplementVersion utility to create a temporary version of the production supplement collection.
  2. The transformation script is used to execute Smarty that will write data to the temporary supplement collection. Smarty in this section will run repeatedly for each row of data.
  3. Smarty in the post job script will run once after Smarty in the transformation script has rendered each row of data. This is the default section for calling the replaceTmpSupplementVersion utility to swap the collections.

Pre-job script: set the stage and gain efficiency

We can gain efficiency and improve performance by processing some of our Smarty before the main request to render millions of rows of data. The pre job script is used to preprocess preliminary data such as variables, attribute values, parameters, and other data objects. The main transformation script can then reference this data for the duration of our job without making repeated calls to recreate the same values.

{* in preview mode createTmpSupplementVersion does not run *}
{$utils->createTmpSupplementVersion($data.supplement_name)}

The createTmpSupplementVersion utility must be called in the pre job script section. This is to create a structurally identical temporary copy of a production supplement collection before data can be loaded.

The createTmpSupplementVersion utility does not run in preview when testing the script.

Transformation script: begin rendering

The transformation script contains Smarty that will run for every row of data from the import source.

Here, we can perform any number of transformations to augment the source data and then load it into the temporary supplement collection. This is identical to updating other supplement collections using the upsertSupplementRecord utility. The difference in this example is that we are loading data into a temporary supplement collection previously created with the the pre job script.

{$dataRecord.email = $dataRecord.email|lower}

{$contact_upsert.lookup_key = "email:{$dataRecord.email}"}

{$contact = $utils->setContact($contact_upsert.lookup_key)}

{if not $data.blockUpdate and empty($contact)}
  {$contact_upsert.data.channels.email.address = $dataRecord.email}
  {$contact_upsert.suppressTriggers = true}
  {$contact = $utils->upsertContact($contact_upsert.data, $contact_upsert.lookup_key, $contact_upsert.suppressTriggers)}
{/if}

{foreach $dataRecord.brands as $data_pos => $data}
  {$supplement.writeData = null}
  {$supplement.key = $data.supplement_name}
  {$supplement.writeData.brand = $data.brand}
  {$supplement.writeData.decile = $data.decile}
  {$supplement.writeData.cID = $contact._id}
  {$supplement.writeData.id = "{$contact._id}_{$data.brand}"}
  {if not $data.blockUpdate} {$supplement.response[] = $utils->upsertSupplementRecord($supplement.key, $supplement.writeData)} {/if} {/foreach} {if $data.metadata.preview} <h2>Debug $data</h2> {$utils->jsonPrettyPrint($data)} <h2>Debug $dataRecord</h2> {$utils->jsonPrettyPrint($dataRecord)} <h2>Debug $contact</h2> {$utils->jsonPrettyPrint($contact)} <h2>Debug $supplement</h2> {$utils->jsonPrettyPrint($supplement)} {/if}

Post-job script: swap supplements

When all data is loaded and replaceTmpSupplementVersion is finally called, our temporary supplement collection becomes the production version and the primary source for data queries.

{* in preview mode replaceTmpSupplementVersion does not run and also post job script does not run *}
{$utils->replaceTmpSupplementVersion($data.supplement_name)}

In the unlikely event that the data job is interrupted and cannot be resumed, we can call the deleteTmpSupplementVersion utility to clear the temporary supplement collection data.

The deleteTmpSupplementVersion utility should not be run without the guidance of a Solutions Engineer.

Comments

0 comments

Please sign in to leave a comment.