Data Donation Configuration
The data donation can be configured using so-called Uploaders, Instructions, and Blueprints.
An Uploader essentially represents an upload form through which one file can be uploaded (either a ZIP container
or a single file).
For each Uploader, a set of Instructions for participants can be created that show how they can access and
upload the requested file.
A Blueprint is used to define what data will be extracted from the file
that participants upload through the Uploader. Each Uploader has one or multiple associated Blueprints (although
if an Uploader expects a single file, only one Blueprint can be associated with it).
To configure the data donation step, go to the Project Hub and click on Data Donation in the Project Configuration section. You can then configure the Uploaders, Instructions, and Blueprints on the following page:
Configure Uploader
When creating an Uploader, you have the following configuration options:
- Name
-
Name of the Uploader. Will be publicly visible to participants in the header of the Uploader on the data donation page.
- Upload Type
-
Either "single file" or "zip file" depending on whether your participants are expected to upload a single file (e.g., CSV or JSON File) or a ZIP-container.
- Index
-
The position of the Uploader on the data donation page. Uploaders with a lower index will be displayed closer to the top of the page. This setting only has an effect, if you use multiple Uploaders in the same project.
- All-in-one consent
-
By default, participants will be asked to consent to the donation of the data associated with each Blueprint. If all-in-one consent is enabled, participants will instead be asked to consent to submit all uploaded data at once. The all-in-one consent question will be displayed at the bottom of the
Figure 2. Default Consent
Figure 3. All-in-One Consent - Associated Blueprints
-
The Blueprints associated to this Uploader. Only associated Blueprints will be applied to the files uploaded through a particular Uploader.
Configure Instructions
Once an Uploader is created, you can add Instructions to it. Donation Instructions consist of one or multiple instruction pages. Instruction pages are displayed as a slide show at the top of the Uploader on the donation page (see the figures on the data donation overview page). For each instruction page, the following can be configured:
- Text
-
The instruction text displayed to the participants. Researchers can also upload and include images or gifs to guide participants through the data donation process in this field (currently, video upload is not supported).
The participant’s external ID is available as a template variable to be included in the instruction text as follows:
{{ participant_id }}which will be displayed to the participant as something likeIPI2wHDWrHODDRKuo8zo101S. This is helpful to enable participants to continue the data donation at a later point in time (e.g., because it can take some time between requesting data takeout and being able to download it); read this section of the documentation to find out how this can be done. - Index
-
The order of the page in the slideshow.
|
If no instructions are defined for an Uploader, the instruction section will be hidden in the participation view. |
Configure Blueprints
When creating a Blueprint, you have the following configuration options:
- Name
-
Name of the Blueprint. Will be publicly visible to participants. Therefore, it is important to define a meaningful name (e.g., "Watch History", "Liked Posts" or similar).
- Description
-
Description of what information the Blueprint will extract from the uploaded file. If defined, the description will be visible for participants in the data donation step.
- Display Position
-
Sets the display order for this Blueprint in the participation interface. Blueprints are shown in ascending order by this value. If multiple Blueprints have the same position value, they will be ordered by creation date (oldest first).
- Associated Uploader
-
The
Uploaderto which the Blueprint is associated. - File path
-
Here, the path where the file is expected to be located within a ZIP file is defined. Only necessary, if the Blueprint is associated to a Uploader that expects a ZIP file. This path can include regex expressions for flexible file matching (see below).
If a regex expression matches two files, DDM extracts the first one that matches the expression. Afterward, it does not look any further, even if the matched file does not contain the expected fields (see below). Therefore, we recommend to be as specific as possible when setting the file path. Examples for regex paths to match files
Regex Description ^MyActivities\.jsonMatches a file named
MyActivities.jsonthat is located at the root of the ZIP file.^SpecificFolder/MyActivities\.jsonMatches a file named
MyActivities.jsonthat is located in a folder namedSpecificFolderin the root of the ZIP file..*/MyActivities\.jsonMatches the first file with the name
MyActivities.jsonthat can be located anywhere in the ZIP file.(^MyActivities\.json|^MeineAktivitäten\.json|^MieAttivita\.json)Matches a file that is located at the root of the ZIP file and either named
MyActivities.json,MeineAktivitäten.json, orMieAttivita.json. Can be helpful to match the same file in different languages.You can find about more about regex here. On this website, you will also find some Tools that can help you test regex patterns.
- Expected fields
-
The fields that must be contained in the file from which information should be extracted. If a file does not contain all fields defined here, No Information will be extracted.
Put the field names in double quotes (") and separate them with commas ("Field A", "Field B"). You can also use regular expressions (regex) to match expected fields - for this, you must enable theexpected field regex matchingoption (see below). - Expected field regex matching
-
Select if you use a regex expression in the
Expected fieldssetting. - Expected File Format
-
The file format of the file from which information should be extracted. Currently, only JSON and CSV is implemented.
JSON specific settings
- Extraction Root
-
Indicates on which level of the files' data structure information should be extracted. If you want to extract information contained on the first level (e.g.,
{'field to be extracted': value}, you can leave this field empty. If you want to extract data located on a higher level, then you would provide the path to the parent field of the data you want to extract (e.g., if your json file is structured like this{'friends': {'real_friends': [{'name to extract': name, 'date to extract': date}], 'fake friends': [{'name': name, 'date': date }]}}and you want to extract the names and dates of real_friends, you would set the extraction root tofriends.real_friends.
CSV specific settings
- CSV Delimiter
-
This field allows you to specify the character that separates values in the expected CSV file. (e.g.,
,,;or\t). If left empty, DDM will try to infer the delimiter from the file structure.
Extraction Rules
The settings above are used to identify and validate the correct file from which data should be extracted.
When the file validation is successful, the Blueprint will start to extract information from the file.
For this, it uses Extraction Rules which will be applied to the file one after another.
The base assumption for the extraction of the data contained in a file is
that you do not want any data. This means that when you configure your extraction rules,
you first have to add a "Keep Field"-rule for each field that you want to keep in your data (see the setting
Extraction Operator below).
- Execution Order
-
The order in which the extraction rules are applied to a file.
- Name
-
The name of an extraction rule. For internal organisation only.
- Field
-
The field to which the rule will be applied. This can either be a "normal" string or a regular expression (regex). If the latter is the case, you must also select
regex field(below). - Regex field
-
Select if you use a regex expression in the
Fieldsetting of this rule. - Extraction Operator
-
Defines the main logic of the extraction step. Below, you see the list of available extraction operators:
Extraction Operator Description Note Keep Field
Keep this field in the uploaded data.
–
Equal (==)
Delete row/entry if the value contained in the given
fieldequals thecomparison value.Works for strings, integers, and dates1.
Not Equal (!=)
Delete row/entry if the value contained in the given
fielddoes not equal thecomparison value.Works for strings, integers, and dates1.
Greater than (>)
Delete row/entry if the value contained in the given
fieldis greater than thecomparison value.Works for integers and dates1. String values are skipped and the row will be kept in the data.
Smaller than (<)
Delete row/entry if the value contained in the given
fieldis smaller than thecomparison value.Works for integers and dates1. String values are skipped and the row will be kept in the data.
Greater than or equal (>=)
Delete row/entry if the value contained in the given
fieldis greater than or equal to thecomparison value.Works for integers and dates1. String values are skipped and the row will be kept in the data.
Smaller than or equal (⇐)
Delete row/entry if the value contained in the given
fieldis smaller than or equal to thecomparison value.Works for integers and dates1. String values are skipped and the row will be kept in the data.
Delete match (regex)
Delete parts of the value contained in the given
fieldthat match the givenregular expression (regex)(e.g., if theregular expression (regex)= "^Watched " and a field contains the value "Watched video XY" the following value will be kept in the uploaded data: "video XY").All field values are converted to strings before this operation is applied.
Replace match (regex)
Replace parts of the value contained in the given
fieldthat match the givenregular expression (regex)(e.g., if theregular expression (regex)= "@([\w-]\.)+[\w-]{2,4}" and thereplacement value= "anonymized" and a field contains the value "some text email@address.com" the following value will be kept in the uploaded data: "some text anonymized").All field values are converted to strings before this operation is applied.
Delete row when match (regex)
Delete row/entry if the value contained in the given
fieldmatches the givenregular expression (regex)(e.g., ifregular expression (regex)= "^Watched " and a field contains the value "Watched video XY" the row/entry will be deleted from the uploaded data).All field values are converted to strings before this operation is applied.
1 Dates are inferred from string values if they are formatted according to ISO, RFC2822, or HTTP standards, and only if both the field value and the comparison value follow the same format. Otherwise, the entry will be treated as a regular string.
- Comparison Value
-
The value against which the data contained in the indicated field will be compared according to the selected Extraction Operator.
- Replacement Value
-
Only required for operation "Replace match (regex)". The value that will be used as a replacement if the regex pattern matches.