Duplicate data is having multiple sets of data from the same individual. For example, Miss Elizabeth Tailor from 123 Albion Street, London has also been entered into the same database as Mrs Tailor from 123 Albion St, London. 10% of a business database will accumulate duplicate data in a year due to people moving, changing marital status, or changing email addresses and phone numbers, just to name a few instances.
Duplicate data can also enter a company’s database from the multiple points of contact used by customers to communicate with the organisation. For example, a customer may give a different set of information to a call centre than they would online, leaving pertinent information either uncollected or in a different account not linked with their original one. This causes a collection of information that can be inconsistent, which then gets spread out across an organisation’s different departments and can internally effect the single customer view and “golden record” initiatives.
Consider all the noise that comes with customer data entering systems, such as:
This lack of data quality can cause costly issues for organisations, which is why it is crucial to deduplicate your data often, as it widely contributes to the data quality life cycle.
Learn more about data quality in our What is Data Quality article.
Data deduplication tools work to match and merge existing data by employing a match code to determine if two records should be considered duplicates. Here is a list of fields used for identifying duplicates:
A data deduplication tool can also use "fuzzy matching" algorithms to combine deep domain knowledge of contact data to match similar records and quickly dedupe your database.
Here is a list of fuzzy match algorithms to identify "non-exact" matching of duplicate records:
You can deduplicate your data in two ways. First, you can deduplicate in data cleansing practices, a process in which a company scrubs their data to ensure it is accurate, verified and compliant through multiple sources. The second way is from a deduplication API, which is built into an organisation’s system, working in the background to ensure their data stays deduplicated over time.
Learn more about data cleansing in our What is Data Cleansing article.
There are 3 ways to achieve data deduplication.
Melissa has been helping businesses improve their data quality for over 35 years with smart solutions that correct, verify, update and enrich customer data. Our full spectrum of data quality solutions gives businesses the tools they need to maintain clean, current and consistent data for more efficient operations and improved marketing and sales efforts. Melissa’s Data Quality Suite instantly verifies contact data at the point of entry for over 240 countries and territories, with flexible tools that are available as on-premise APIs or Web services to meet your specific needs.