Documentation for Researchers

This part of the documentation is addressed at the users of the Data Donation Module (i.e., the researchers) and will explain:

How to set up a data donation project.
How to monitor an ongoing data donation project.
How to download and work with the collected data.

It is assumed that a server with DDM set up on it is running. Readers interested in setting up a server to run DDM for institutional use or their own research should turn to the Documentation for Administrators. For a general introduction to the module, see this page.

Setting Up a Data Donation Project

Creating a Project

To set up a new data donation project, got to the /projects/ page and click on "+ Create New Project". On the project creation page, you will be presented with the following options to configure your project:

Project Name: Name of the project. Visible to participants in the browser’s title bar or a page’s tab.
URL Identifier: Identifier that is included in the URL through which participants can access the project (e.g, https://root.url/project-slug).
Contact Information: Contact information of the researcher responsible for the project. Is linked in the footer of the donation interface and can be viewed by data donors at any stage of the data donation process.
Data Protection Statement: Data protection statement that describes how the data is processed. Is linked in the footer of the donation interface and can be viewed by data donors at any stage of the data donation process.
Super Secret: When creating a "super secret" project, you will have to provide a project password, which will be used to encrypt all collected data donations and survey responses (for more information on how collected data is encrypted, see here). This password will not be saved by the application, and the data collected for this project can only be encrypted by entering the password that was used as a super secret when the project was created. Please note that there is no way to recover or reset this password if it is lost. Furthermore, also note that making a project "super secret" will limit the functionality of the module: it won’t be possible to create followup questions based on the data donation (i.e., data points from the data donation cannot be referenced in a question).

The Project Hub

Once a project is created, you will be redirected to the "Project Hub". From here, you can access all settings and information that are relevant for your data donation project. It consists of four main areas:

Project Details
Project Configuration
Data Center
Danger Zone

Editing Project Details

Base Settings

Project Name: Name of the project. Visible to participants in the browser’s title bar or a page’s tab.
External Project Slug: Identifier that is used to expose the project to participants (e.g, https://root.url/project-slug ).

Public Project Information

Contact Information: Contact information of the researcher responsible for the project. Is linked in the footer of the donation interface and can be viewed by data donors at any stage of the data donation process.
Data Protection Statement: Data protection statement that describes how the data is processed. Is linked in the footer of the donation interface and can be viewed by data donors at any stage of the data donation process.

URL Parameter Extraction

Optionally, information can be extracted from parameters passed with the URL when a project is accessed by a participant. This can be configured using the following settings:

URL parameter extraction enabled: Enable or disable whether URL parameters should be extracted when participants access the project’s briefing page.
Expected URL parameter: Provide a string containing the parameters that should be extracted. Separate multiple parameters with a semikolon (e.g., "parameter_A;parameter_B"). The extracted values will be saved for each participant and is included in the data export. If a parameter is not present, it will be saved as None. Undefined parameters passed in the URL will be ignored.

If participants are asked to first fill out a questionnaire created in an external survey software such as e.g. Unipark or Qualtrics, passing URL parameters can be used to link data donations with the survey data by passing a participant ID in the URL parameters.

Redirect Configuration

Optionally, participants can be redirected to another website from the debriefing page. This can be configured using the following settings:

Redirect enabled: Enable or disable the redirection of your participants when they have completed your project. If enabled, a redirect button will be displayed on the data donation end page that redirects to the URL defined in the Redirect target setting.
Redirect target: URL to which participants will redirected (only required if redirect is enabled).

The redirect URL can include information on the participant and the project ID. For this, you can use variables that are populated with the respective values. Currently, the following variables are supported: The participant data is accessible with {{ participant }} and the project ID can be inserted with {{ project_id }}. E.g., https://redirect.url?param={{participant.data.url_param.URLParameter}}&participant={{participant.public_id}}&project={{project_id}}

See this section for more information on the {{ participant }}-variable.

Project Appearance

The following settings are available to customize the appearance of your data donation project:

Header Image Left/Header Image Right: Upload an image that will be displayed in the header of your project (e.g., an institution or project logo).

Editing Project Configuration

The structure of the Project Configuration follows the steps of the prototypical data donation process. It consists of the following sections:

Welcome Page: Define what is displayed to participants when they enter your project.
Data Donation: Define the expected data donations, extraction rules, and donation instructions.
Questionnaire: Define questions that will be shown to participants after they have donated their data.
End Page: Define what is displayed when participants reach the end of the data donation.

Briefing

Briefing Text: Text displayed to participants on the briefing page.

in the briefing text, you can make use of dynamic template functionalities. You can read more about how to use dynamic templates here.

Briefing Consent Mandatory: If briefing consent is enabled, participants will have to explicitly indicate their consent at the bottom of the briefing page before they can continue. If a participant does indicate that they to do not consent, they will be redirected to the debriefing page.
Briefing consent label yes/Briefing consent label no: The labels displayed to participants to indicate consent ("briefing consent label yes") or reject consent ("briefing consent label no").

Data Donation

The data donation is organized in File Uploaders and Donation Blueprints.

A "File Uploader" corresponds to the file that is expected to be uploaded. This file can either be a single file (e.g., a JSON file) or a ZIP container.

For each File Uploader, a set of Instructions for participants can be defined that describe how they can access and upload the requested file.

Each uploader has one or multiple associated Donation Blueprints (although if a File Uploader expects a single file, only one Donation Blueprint can be associated with it). A Donation Blueprint defines how the data contained in a single file (e.g., the uploaded file in the case of a single file upload, or a file contained in the ZIP container in the case of a ZIP upload) is extracted.

The data donation step can incorporate multiple File Uploaders.

Configure File Uploader

Name: Name of the File Uploader. Will be publicly visible to participants in the header of the file uplaoder.
Upload Type: Either "single file" or "zip file".
Index: The position of the file uploader on the data donation page. Only relevant if multiple file uploaders are displayed – file uploaders with a lower index will be displayed closer to the top of the page.
Associated Donation Blueprints: The donation blueprints that apply to the expected file(s) collected with the file uploader.

Configure Instructions

Donation Instructions consist of one or multiple instruction pages. Instruction pages are displayed as a slide show at the top of file uploader. For each instruction page, the following can be configured:

Text: The instruction text displayed to the participants. By default, researchers can also upload and include images or gifs to guide participants through the data donation process in this field (currently, video upload is not supported).

The participant’s external ID is available as a template variable to be included in the instruction text as follows: {{ participant_id }} which will be displayed to the participant as something like IPI2wHDWrHODDRKuo8zo101S. This is helpful to enable participants to continue the data donation at a later point in time (e.g., because it can take some time between requesting data takeout and being able to download it); read this section of the documentation to find out how this can be done.

Index: The position of the page in the slideshow.
NOTE: If no instructions are defined for a File Uploader, the instruction section will be hidden in the participation view.

Configure Donation Blueprint

Name: Name of the expected data donations. Will be publicly visible to participants. Therefore, it is important to define a meaningful name.
Description: Description of what the blueprint will extract. If defined, the description will be visible for participants in the data donation step.
Associated File Uploader: The File Uploader for which the blueprint will be applied.
File path: Here, the path where the file is expected to be located within a ZIP file is defined. Only necessary, if the Donation Blueprint is part of a Blueprint Container.

If a regex expression matches two files, DDM extracts the first one that matches the expression. Afterwards, it does not look any further, even if the matched file does not match the expected fields. Therefore, be careful to choose regex expressions that will only match the expected file.

Examples for regex paths to match files

Regex Description

^MyActivities.json

Matches a file named MyActivities.json that is located at the root of the ZIP file.

^SpecificFolder/MyActivities\.json

Matches a file named MyActivities.json that is located in a folder named SpecificFolder in the root of the ZIP file.

.*MyActivities\.json

Matches file if the filename ends with MyActivities.json that can be located anywhere in the ZIP file. Warning: This also matches e.g. BogusMyAcitivties.json.

(^MyActivities\.json|^MeineAktivitäten\.json|^MieAttivita\.json)

Matches a file that is located at the root of the ZIP file and either named MyActivities.json, MeineAktivitäten.json, or MieAttivita.json. Can be helpful to match the same file in different languages.

You can find about more about regex here. On this website, you will also find some Tools that can help you test regex patterns.

Expected fields: The fields that must be contained in the donated file. If a file does not contain one or more of the fields defined here, it will not be accepted as a donation. Put the field names in double quotes (") and separate them with commas ("Field A", "Field B").
Expected File Format: The file format of the expected data donation. Currently, only JSON and CSV is implemented.

JSON specific settings

Extraction Root: Indicates on which level of the files' data structure information should be extractet. If you want to extract information contained on the first level (e.g., {'field to be extracted': value}, you can leave this field empty. If you want to extract data located on a higher level, then you would provide the path to the parent field of the data you want to extract (e.g., if your json file is structured like this {'friends': {'real_friends': [{'name to extract': name, 'date to extract': date}], 'fake friends': [{'name': name, 'date': date }]}} and you want to extract the names and dates of real_friends, you would set the extraction root to friends.real_friends.

CSV specific settings

CSV Delimiter: The delimiter used in the expected file (e.g., , or ;). If left empty, the used Javascript function will try to infer the delimiter from the file structure.

Extraction Rules

The base assumption for the extraction of the data contained in an uploaded file is that you do not want any data. This means that when you configure your extraction rules, you first have to add a rule for each field that you want to keep in your data.

Execution Order: The order in which the extraction rules are applied to a file.
Name: The name of an extraction rule. For internal organisation only.
Field: The field to which the rule will be applied.
Extraction Operator: Defines the main logic of the extraction step. If empty, this indicates that you want to keep the field in the donated data. For all non-regex operations, if an operations evaluates to True, the row will be deleted from the donated data (further explanations on the separate rules will follow).
Comparison Value: The value against which the data contained in the indicated field will be compared according to the selected comparison logic.
Replacement Value: Only required for operation "Replace match (regex)". The value that will be used as a replacement if the regex pattern matches.

Questionnaire

Researchers can optionally define a questionnaire that is displayed after the data donation.

How the Questionnaire Works

The questionnaire is displayed after the data donation step and consists of one or more pages, each consisting of one or more questions.

A question can either be general or be related to a file blueprint.
General questions are displayed to all participants, regardless if they successfully donated any data.
Questions related to a file blueprint are only displayed to those participants that successfully uploaded some data to the related file blueprint. This means that if the data extraction for a specific file blueprint either fails, is not attempted or zero data entries are extracted because all entries were filtered out, the question will not be displayed.
For questions related to a file blueprint, the data extracted by the related blueprint for a given participant is available to be included in the question text (see below for more information on how to include data in question texts).

The questionnaire responses are only submitted to the server after all questions have been answered. This means that if a participants aborts the questionnaire after filling out, e.g., 2 out of 4 questions, no responses will be collected and saved on the server.

Question Configuration Settings

Currently, the following question types are implemented:

Single Choice Question
Multi Choice Question
Matrix Question
Semantic Differential
Open Question
Transition Block (plain text, without any response options for the participant)

Depending on the question type, the following attributes can be configured:

Name: Question name - only used for internal organisation.
Blueprint (optional): If associated to a blueprint, the data extracted by this blueprint for a given participant is available to be included in the question text (see below for more information on how to include data in question texts). If the associated blueprint did not extract any data, the question will not be displayed. If a question should always be displayed, select the option "General Question" here.
Page: Number of the page on which the question should be displayed.
Index: Order in which questions on the same page should be displayed.
Variable Name: The variable name associated to this question. Will be included in the data export. For items belonging to a question, the variable name will be constructed as follows: "question_variable_name-{item-value}".
Text: The question text that is displayed to participants.
Required: If a question is marked as required, the application will show a hint to the participant if they forgot to answer this question. This hint will only be shown once. This means that if a participant chooses to ignore the hint and clicks on 'continue', they are able to skip a required question.
Randomize items: Enable or disable randomization of all items.

Question Items

Here, the items belonging to this question are listed. Click on edit to add and configure the following attributes of question items.

Label: The label/text of the item that is displayed to participants related to an item. For semantic differential questions, this is the label displayed on the left-hand side of the scale.
Label alt: Only for semantic differential questions: the label displayed on the right-hand side of the scale.
Index: Defines the order in which the items are displayed.
Value: Is (a) the identifier of an item and (b) used to indicate which item(s) has or have been selected in the data export (only for Single and Multi Choice Questions).
Randomize: Instead of randomizing the order of all items with the randomize setting on the question level, this setting allows randomizing only certain items while those for which this option is not ticked stay in their place (i.e., according to their index).
Delete: Tick this box if the item should be deleted. The item will be deleted as soon as you click on Update Items.

Scale Configuration

Configure the scale displayed to participants on which they will rate the items related to this question (only for Matrix Question and Semantic Differential). Click on edit to add and configure the following attributes of a scale.

Index: Defines the order in which the items are displayed.
Label: The label/text that is displayed to participants associated to the scale point.
Value: Used to indicate which scale point has been selected in the data export.
Add border: Setting currently has no effect - still to be implemented.
Delete: Tick this box if the scale point should be deleted. The scale point will be deleted as soon as you click on Update Scale Points.

Including Data Dynamically in a Question

Depending on the question configuration it is possible to dynamically include both participant-related and donation-related data points in a question text.

For this, DDM uses the Django template engine to render question text dynamically for each participant. This means that researchers can include Django’s built-in template tags and filters in the question text to customize question texts as well as access data accessible through predefined variables. This also works for the briefing and debriefing pages.

DDM provides two custom variables to be used:

A variable called participant that contains information about the current participant (available in all questions).
A variable called data that contains the data donated by the current participant (only available for questions that are related to a file blueprint).

Participant-Related Data

The participant related data contains the following information:

{
    "public_id": "S0meLonGCh4rSeQuence",
    "data": {
        "url_param": {
            "parameter_a": "value a",
            "parameter_b": "value b"
        },
        "briefing_consent": "1"   # or "0" if participant did not consent to take part in the study
    },
    "donation_info": {
        "n_success": 1,           # number of successful donations by this participant
        "n_pending": 1,           # number of pending (i.e., not attempted) donations by this participant
        "n_failed": 0,            # number of attempted but failed donations by this participant
        "n_no_data_extracted": 0  # number of donations by this participant where all entries were filtered out
    }
}

This information can be embedded in any question (i.e., general or blueprint related), any item label, in donation instructions, as well as in the briefing and debriefing text as follows:

Dear participant I would like to tell you something about yourself.

{% if participant.data.briefing_consent == "1" %}
You did consent to take part in this study and for this we are really grateful.
{% else %}
Unfortunately, you did not consent to take part in the study but we respect your
decision and completely understand! :')
{% endif %}

Your public ID is as follows: "{{ participant.public_id }}". Please take a photo
of this ID. If you wnat to request the deletion of your data at some point in the
future you can send an e-mail with your personal public ID to someone@mail.com
and your data will be deleted from our servers.

A participant that consented to take part in the study will see this question as follows:

Dear participant I would like to tell you something about yourself.

You did consent to take part in this study and for this we are really grateful.

Your public ID is as follows: "xHQVbrYUYXn5lklW3RGSeouX". Please take a photo
of this ID. If you wnat to request the deletion of your data at some point in the
future you can send an e-mail with your personal public ID to someone@mail.com
and your data will be deleted from our servers.

Donation-Related Data

If a question is related to a file blueprint, the data that was extracted from a participant’s data donation through this file blueprint is available as a template variable called data. As in the case of the participant-related data, the donation-related data can be included in a question text definition following the same logic:

This is the title of the last video you watched on YouTube: "{{ data.0.title }}"

Please indicate below, why you watched this video.

This will be rendered as follows:

This is the title of the last video you watched on YouTube: "Data Brokers: Last Week Tonight with John Oliver (HBO)"

Please indicate below, why you watched this video.

The example above assumes that the donated data is structured as follows: [{'title': 'Data Brokers: Last Week Tonight with John Oliver (HBO)', 'timestamp': '2022-12-19T08:49:18'}, {'title': 'Title of second video', 'timestamp': '2022-12-16T11:43:02'}, …].

If you are starting to construct a dynamic question text, first include the complete data objects stored in the variables in your question text (e.g., {{ participant }}, {{ data }}).

Next, open the link to your study in an anonymous browser window and go through the steps until you reach the questionnaire part. This way you can see how the data object is structured and figure out from there, how you can access the information on deeper levels of the data structure. You can then start to adjust the variables and reload the anonymous window every time you made a change to the question definition to see how your new specification will be rendered.

Debriefing

Debriefing text: Text displayed to participants on the debriefing page.

Sometimes you might want to display different debriefing texts depending on the previous actions of your participant (e.g., if a participant did indicate that they do not want to take part in the study, or if a participant did not attempt to donate any data).

For this, you can make use of the templating engine and, for example, define the following debriefing page that displays a different text to a participant that did not attempt to donate any that compared to a participant who donated at least some data (the example assumes that two donations were expected in this study):

{% if participant.donation_info.n_success > 0 %}
Dear participant,

Thank you very much for participating in our study. With your data donation, you
made a great contribution towards advancing our understanding of algorithmic
selection on the internet.
{% endif %}

{% if participant.donation_info.n_pending == 2 %}
Dear participant,

Thank you for your time. Because you did not attempt to donate any data, you are
unfortunately not eligible to receive the participation reward.
{% endif %}

You can read more about how to use dynamic templates here.

Data Center

In the Data Center, you can find options to (a) access the collected data, (b) access the project logs, and (c) find some general field statistics about the progress of your project.

Data Download: Accessing Collected Data Donations

There are two options to download your data:

Internal Download: When you are logged in, click on Download Data as JSON. This will gather your data from the database, and you will be able to download the json file in your browser.
External Download via API (advanced option): There is also the possibility to download your data through an API endpoint remotely. For this, an API token has to be created which will need to be supplied when sending the request to the API (see below for an example).

Through the admin interface, a project can only be accessed by the user who created it. This means that the internal download is only accessible for a project owner. The API token, on the other hand, can be used to share data access with colleagues working on the same project. However, be careful with whom you share this token as it exposes the sensitive data collected from participants. We recommend choosing a short expiration date for Tokens that you create.

Data Structure

When downloading your data, you will receive a .json file with the following structure:

{
    "project": {
        "pk": 1,
        "name": "project name",
        "date_created": "2022-12-19T08:49:18.363880+01:00"
    },
    "donations": {
        "blueprint name 1": [
            {
                "participant": 1
                "project": 1,
                "time_submitted": "2022-12-19T08:49:18.363880+01:00",
                "consent": true,
                "status": "success",  # One of "success" (donation successful); "pending" (donation not attempted); "failed" (donation failed due to an error); "nothing extracted" (all data filtered out)
                "data": [
                    {
                        "extracted_field_1":  "value1_entry1",
                        "extracted_field_2":  "value2_entry1",
                        # ...
                    },
                    {
                        "extracted_field_1":  "value1_entry2",
                        "extracted_field_2":  "value2_entry2",
                        # ...
                    },
                    # etc.
                ]
            }
        ],
        "blueprint name 2": [
            {
                "participant": 1,
                "project": 1,
                "time_submitted": "...",
                # etc.
            }
        ]
    },
    "questionnaire": [
        {
            "participant":  1,
            "project":  1,
            "time_submitted":  "2023-01-19T08:49:18.363880+01:00",
            "responses": {
                "variable_name": "answer to question",
                "variable_name-item-value": "answer to item"
            },
            "meta_data":  {  # For data validation purposes: contains one entry per question consisting of meta information about how the question was presented to the participant:
                "question-id": {
                    "response": "1",  # Can be a dict of form {"item-id":  "item-response", ...} for question types with item responses
                    "question": "question text in html format as displayed to participant",
                    "items": [
                        {
                            "id": "33",
                            "label": "label text in html format as displayed to participant",
                            "label_alt": "alternative label text in html format as displayed to participant",  # only applies to semantic differential
                            "index": "1",
                            "value": "1",
                            "randomize": false
                        }
                    ]
                }
            }
        }
    ],
    "participants": [
        {
            "pk":  1,
            "project":  1,
            "external_id":  "DMXdpfVyksagfqql2cTgp8kF",
            "start_time":  "2023-01-18T08:49:18.363880+01:00",
            "end_time":  "2023-01-20T08:49:18.363880+01:00",
            "completed":  true,
            "extra_data":  {
                "url_param": {
                    "some_url_parameter": "some value extracted from this parameter when the briefing page was called.",
                    # ...
                }
            }
        }
    ]
}

Project Log: Monitoring an on-going Project

The Project Log should help researchers to monitor their ongoing data collection and to identify potential problems occurring during the data donation.

Two types of logs exist for each project: An Exception Log and an Event Log.

Exception Log

The Exception Log lists all exceptions that participants encountered during the study. The log provides the following information:

The date and time when the exception occurred
The type of the exception (for a description of the type-codes, see here)
Which participant encountered the exception (if applicable)
For which file blueprint the exception occurred (if applicable)
If the exception was raised on the client-side (i.e., the participant’s browser) or the server side
A message describing the exception

Event Log

The Event Log currently registers the following events:

When an access token is created or deleted
When the API endpoint for the project was called but authentication failed or permission was denied
When a data download attempt was successful, failed or denied.
When a data delete attempt was successful, failed, or denied.
When a participant delete request was successful, failed, or denied.

Participation Statistics

The Participation Statistics currently display the following information:

Donated Files: The number of successfully donated files.
Started: The number of participants that started the study.
Completed: The number of participants that reached the debriefing page of the study.
Completion Rate: The number of started divided by the number of completed participations.
Avrg. to complete: The average time it took a participant from starting the study to reaching the debriefing page.

Danger Zone

Here, you can find all options that affect the data collected in the course of your project:

Reset Project Data

With this option, you can delete all data collected for a given project.

Delete Participant

You can delete the data for a given participant by providing their external participation id.

You can show your participants their external participation id (also referred to as public_id) during the study (e.g., embedded in a transition block as part of the questionnaire) and inform them that by providing their public ID they can request the deletion of their data in the future if they change their mind.

This function can then help you to delete a participant from the database (if your study is still ongoing at the time a participant requests the deletion of their data).

See here for an example of how this could be done.

Delete Project

With this option you can delete the current project. This will also delete all associated data.