Digital Data Collection through Data Donation

Financed by: ELKH

Project start: 01/11/2022

Project duration: 24 months


Survey research has dominated quantitative social science for the last 50-70 years. Researchers have always known the weaknesses of this method, but nothing has broken its hegemony as the best technique available. However, recent changes have challenged the leading role of survey research. One part of these changes is the increasing difficulty of fieldwork and declining response rates; another driver of change is the emergence of new digital data sources. Some of this digital data is content users share, such as tweets, posts from places they like, or other reactions and interactions on social media. Digital data also includes the unintentional data that the various devices we use collect about us or our life (e.g., mobile phone location data). Digital data can already replace classic survey data in many areas. However, it is not evident that this should mean a complete paradigm shift in data collection in social science research. The survey method has advantages that cannot be replaced but make it an invaluable data collection method. However, the combination of the two data collection methods may be able to overcome the weaknesses of each method, and the right combination of survey and digital data may even result in new knowledge elements that are not just the sum of the parts. The main objective of our research is to create and test a methodological framework that allows for the effective conduct of mixed surveys in a changing digital data access environment.


The standard way of accessing digital data used to be APIs, but social networking sites such as Facebook and Instagram have disabled these solutions. These changes have implied the development of new digital data access models. One of the most promising new approaches is called data donation. GDPR obligations require large platform providers to offer users access to their data through "data download packages" (DDPs). In the data donation model, researchers invite users to share their digital data stored by the platform. The key benefit of partnering with users rather than companies is that it makes the data collection process more transparent for research participants. As this research approach is based on active collaboration with participants, it is easy to link this data collection with survey research. Combining the two data types is an ideal way to exploit their unique strengths and overcome their limitations.


Our research is based on a multi-platform data collection on a representative sample of Hungarian internet users. We plan to involve 500-800 people in the research. Multi-platform here means combining digital and survey data and collecting digital data from different sources - Facebook, Instagram, TikTok, Twitter, and Google. This data collection design is unique and novel; no international project uses a multi-platform approach to collect social media data in parallel on a representative sample.


The data collection work for this research, led by TK's CSS-Recens team, is carried out by NRC. You can download the privacy notice of the research here.


The research has been approved by the Research Ethics Committee of the Centre for Social Sciences under the number 1-FOIG/130-37/2022.