‘Large-scale data processing’ is not explicitly defined despite it being used within regulation such as the General Data Protection Regulation (GDPR). The ICO themselves provide pointers but again no clear definition.
This lack of definition becomes more apparent when a (Record of Processing Activities) RoPA is being completed. The RoPA module within the ProvePrivacy platform asks the user early on ‘Does this activity process personal data on a large scale?”. It is important to ask this to allow the user to start determining where there may be higher risks amongst the activities which are being undertaken.
In this blog we look at what is being used to measure large scale data processing, why it’s important and what to be aware of.
What is large scale data processing?
Whilst there is no clear definition of what ‘large-scale’ means within the GDPR regulation, there are some pointers to help us to understand. For example the ICO states that large scale can relate to the:
- Number of data subjects
- Volume of data
- Variety of data
- Duration of processing
- Geographical extent
With 5 different measures listed and each so different it becomes clear why there is no one definition. Here are some examples that showcase the variances between those measures:
- Social media: With the volume of data social media platforms collect about individuals across many aspects of their social lives, this is a clear example of large-scale data processing.
- Tracking of individuals: 50 data subjects would not be considered large-scale; however when the processing is collecting tracking data for those subjects over hundreds of days it soon moves into a large-scale processing because high volumes of data are collected across a large geographical context.
- Population perspective: If personal data is collected in two different organisations, where organisation A has 500 employees and organisation B has 5,000 employees, we may be asked to consider which might be processing data on a large scale. The answer could be ‘both’ if the processing is taking place across 100% of the relevant population of data subjects.
- ICO Examples: Examples from the ICO include; a hospital processing patient data, but not an individual doctor. A bank processing customer data or an internet service provider processing customer data.
Many commentators on the subject have attempted to place a number of data subjects into the definition, but this number alone does not take into account other contexts. A clearer definition might therefore be twofold:
- the number or proportion of data subjects, or
- the volume or extent of the data items.
Put very simply, ‘are you processing lots of personal data?’
Why is it important?
If large volumes of data are being processed securely then it is sometimes questioned why further consideration is needed. Asking if data is processed on a large-scale helps determine the fairness of processing which, in turn, allows the risks to the data subject to be assessed.
For example, large-scale processing of sensitive data is a clearly defined rationale for a high risk activity, and in that instance would require that an impact assessment is carried out and risks mitigated using a Data Protection Impact Assessment (DPIA).
Like many aspects of data protection, large-scale does not necessarily mean stop, it means, slow down, consider the risk and make sure that your processing is fair and any potential risks are reduced.
Things to be aware of when doing large scale data processing?
On its own large-scale processing is not considered to be a significant enough risk to mandate a DPIA. Under GDPR, a DPIA is required when data processing is likely to result in a high risk to the rights and freedoms of individuals.
When additional context is added there is a higher likelihood that a DPIA will be required, for example when it is large-scale processing of sensitive data or large-scale profiling of individuals.
Whilst not the subject of this article, it should be noted that the purpose of the DPIA is to document that there is a high risk to the data subject and then assess if the processing is necessary and proportionate. It is not enough to state that an organisation needs to process the data for a particular purpose, the rights of the data subject must also be considered.
The DPIA will identify the risks to the data subjects’ rights and it will address these by considering how each risk can be lowered. Where there is any residual risk which is still considered to be high after mitigation then further consultation should be sought with the regulator.
There may however be some controls which you should consider to ensure that data protection principles were being met, for example:
- Accuracy: Ensuring that the data is clean, accurate, and consistent is paramount, this can be achieved through data cleansing processes that identify and rectify errors in the dataset and will reduce the risk of poor decision making.
- Security: With the increase in data breaches, protecting large volumes of information becomes crucial. Implementing robust encryption methods and access controls can safeguard data against unauthorised access and cyber threats.
So whilst there is a reason that large-scale data processing doesn’t have a clean cut definition, it is important that those handling the processing of data understand the measures and when and how they apply them to ensure fair and lawful processing of data.
The ProvePrivacy platform has been designed to help ensure that for Data Protection Officers there is a central platform to view, monitor and manage and for those with less data compliance expertise they are guided through a series of questions that help them identify potential risks and then manage or mitigate them. The RoPA module provides a great example of this process.