LinkWise: A Modern Privacy Preserving Record Linkage Software

LinkWise is a data linkage software product created by PolicyWise for Children & Families to link population-level data. It follows a probabilistic data linkage model and supports clear-text (unencrypted) and privacy preserving (encrypted) record linkage. The clear-text technique is a good choice when there is no privacy issue in sharing identifiers (e.g., linking data within organizations), while the privacy preserving technique is suitable when there are privacy issues in sharing identifiers (e.g., linking cross-organizational data).

LinkWise is very simple and user-friendly. Data analysis skills are not required when linking data using LinkWise. The software pre-processes and cleans data, splits the identifiers from service-related fields, and calculates all required linkage parameters automatically. It requires minimum interaction with the end-user. The main features of LinkWise include the following:

  • Automated all data linkage steps
  • A simple and user-friendly interface
  • Ability to link both unencrypted and encrypted data (privacy preserving record linkage)
  • Transparent linkage algorithm (Not a black box)
  • Ability to perform incremental linkage (linking new data to previously linked data)
  • Ability to handle millions of records
  • Ability to run on multiple processors to reduce run time
  • High specificity and sensitivity
  • Affordable price

The software consists of two parts: the client side and the server side. Below, Figure 1 shows the workflow to link two datasets using LinkWise:

Figure 1: LinkWise Workflow


As shown on this figure, the client side of the software splits each dataset into two files: the research file and the linkage file. The research file contains all research fields, and the linkage file contains all identifiers. The linkage files generated by the client side of the software are then submitted to the server side to generate linkage keys. Linkage keys are then used to link research files.

Figure 2 below shows a snapshot of the client side of the software:

Figure 2: A Snapshot of LinkWise (Client Side)


The Client side of the software operates on the computer where the original data is located. As shown on Figure 2, the software requests a path to the dataset to be linked. In addition, it requires an encryption file that is used to hash all identifiers for privacy preserving record linkage.  However, if the end-user selects the “Export as plain text” option, then the identifiers will be unencrypted, and no encryption file is required (for clear-text data linkage). By clicking on the “Read Data” button, the software reads the file and shows the list of columns in the list box located on the left. By clicking on each column, possible values of the column will be shown in the second list. Using the “Field Type” list, the user can assign a type to each column and then add it to the selection list located on the right. Each column can be either a linkage field or a research field. By clicking on the “Create Data” button, the software creates a research and a linkage file based on the selected columns. The end-user then submits the linkage file to PolicyWise for data linkage. The research file, which does not have any identifiers, can be submitted to researchers after data validation and possibly data anonymization.

The server side of the software operates on a server located at PolicyWise. The software receives the linkage files created by the client side of the software and links the data. It generates a file called linkage keys. The linkage keys are then submitted to researchers to link research files.


Frequently Asked Questions (FAQ) 

How does the LinkWise software preserve the privacy of individuals?

The software hashes all identifiers using a one-way hashing technique and the data linkage will be performed on hashed data.

What data fields can be used for data linkage?

Personal Health Number, SIN number, phone number, last name, middle name, first name, Sex, Province, City, Address, postal code, and date of birth are the fields that can be used for data linkage. Except last name, all other fields are optional. So if you do not have many of those listed fields in your data, the software is still able to perform data linkage.

What is being taken offsite?

Identifiers are hashed before they leave your premises.

What are the chances data can be decrypted?

Since the identifiers are one-way hashed, it is nearly impossible to reverse engineer them.


For more information or request linkage services, please contact us at