How can we help you?

The Research Data Service has data publishing experts and we're here to help!

Contact Us

You can contact us at databank@library.illinois.edu



Journal Supplemental Materials


Some publishers offer the option of contributing data as "supplemental materials" in the form of a pdf.

The Illinois Data Bank can provide a more flexible alternative, as detailed below:

Criteria

Supplemental Material

Illinois Data Bank

Size limitations on files Often lower than 100 MB 2 Terabytes
Format limitations on files Restrictions and requirements are possible. Often PDF only. Any format is accepted
Digital Object Identifier (DOI) Unlikely to be available Automatically provided
Metadata available, exportable, and searchable Unlikely to be available Automatically provided
Access restriction / findability Can be hidden by paywalls or publisher-chosen access controls. Data will be freely available (after release from any embargoes you choose to assign).
Download statistics Unlikely to be available Automatically provided
Storage infrastructure Stability and suitability for long term storage is usually not be guaranteed Stable preservation environment that complies with many funder and publisher requirements
Cost to publish Sometimes fee-based No cost for University of Illinois researchers
Guaranteed availability Unlikely to be guaranteed for multiple years or regularly reviewed Every dataset is guaranteed to be available for a minimum of 5 years, with longer storage likely. Regular review and curation will ensure continued availability and preservation best practices as time passes.
Connections to additional papers that uses the data and other materials Sometimes available We link your dataset to articles, code, theses, other data sources, etc.
Standard or custom licensing statements Unlikely to be available CC0 and CC BY are standard offerings; you can also upload a customized license statement.
Go to Top


Getting Started Guide


Welcome to the Illinois Data Bank! This guide will cover the essential steps for publishing your data. You can contact us at any time to get additional help.

Log in with NetID

The first step to depositing data within the Illinois Data Bank is logging in with your Illinois NetID and Active Directory password. The Illinois Data Bank will check your NetID and confirm that you belong to a group eligible for self-deposit. Illinois faculty, graduate students, and most staff are eligible for self-deposit, but others (including undergraduates) will see a restriction notice after logging in. Contact the Research Data Service staff if you run into trouble or need to request authorization for self-deposit.

Describe

The information you provide here will be attached to your dataset as metadata associated with your DOI. Providing detailed descriptive information is important because it will be transmitted to search engines such as Google Scholar and other aggregators that look at data. In short, the better you can describe your data the easier it will be for others to find and make use of it!

In a hurry to deposit? You can always edit and expand your metadata after publication. Research Data Service staff also regularly review the metadata and add information to increase the visibility of your dataset.

There are three sections to add information about your dataset:

  1. Description

  2. Funder

  3. Related Materials

The description area allows you to indicate the authorship information, title, basic information about the contents of your dataset, and an optional release date. The funder information section allows you to provide grant or other information about financial support of your dataset. The related materials area is where you can provide information about scholarly works that use or have contributed to your dataset. Research Data Service staff review this section and create links between these works and your dataset record. This allows publishers and altmetrics tools to make connections to your published data.

Upload

This is when you provide us with a copy of the files you'd like to be published and saved within the Illinois Data Bank. You may either upload directly from your computer or import files you already have stored inside of Box. Importing files from Box is recommended for larger files. Depending on your file size, there are several options for uploading your individual data files:

  • Up to 15GB: Use the standard file selection tool, drag-and-drop upload options, or Box import

  • Over 15 GB: Use the command line option with token or Contact the RDS for assistance

Unlike the metadata, you will not be allowed to edit or change your data files after publication without curator assistance. We are happy to accommodate updates and changes when necessary, but you will need to contact us if you need to make a change.

Need a bit more time to prepare your data files? No worries. You may save and exit your draft at any time. Let us know if we can help you get your data prepared and documented.

Things you are required to do before publishing:

  • Make sure you have permission from the creator of the dataset to deposit it if you are depositing on behalf of someone else.

  • Remove any sensitive data from each file to be deposited.

Good data behaviors:

  • Include a readme file, data dictionary, and/or codebook as appropriate.

  • Include versions in open formats for long-term preservation.

Unsure how to handle any of these requirements and suggestions? Contact us and a Research Data Service staff member will help guide you through it.

Research Data Service staff regularly review the contents of dataset deposits. You can read more about what to expect as a depositor in our Curation Procedures help section.

Review

This phase of publication gives you a chance to look over all the information you've provided and confirm that everything is ready go to. You have the option of saving your work and exiting from the publication process if you need more time to gather information from a colleague, Research Data Service staff, or other sources.

In addition, you will have the option to choose when you would like curators to review your dataset:

  • Review and then publish (recommended) option: lets curators preview your submission before it goes live on the Illinois Data Bank to eliminate errors before publication. With this option, you receive a reserved DOI that works after the review (2-5 business days).

  • Publish then review option: you receive a working DOI today. However, because curation occurs after publication, any errors that are found will result in having to republish the dataset, which will require a new DOI.

Go to Top


Curation Procedures


Curious about what happens to your data after you deposit it? Research Data Service staff monitor deposits and

perform several levels of curation checks. The depth of the dataset review will depend on the size of the dataset deposited, how well documented it is, and general staff availability. These reviews are not peer review and do not judge the core scientific analysis, methodologies, or conclusions behind the data. Instead, the purpose of review is to ensure metadata completeness and dataset discoverability.

The curator reviews the metadata provided and basic file information, which may include the following activities:

  • Metadata formatting and light copyediting
  • Adding metadata links between related resources, such as the paper a dataset supports
  • Identification of areas in need of major additions or changes within the metadata and/or files

In addition to review for metadata completeness and dataset discoverability, as can be detected, staff may also note:

  • Deposits that appear spurious
  • Metadata or data files that appear to potentially include erroneous or sensitive content (The Illinois Data Bank Withdrawal Guidelines has more information on how curators handle datasets with errors or sensitive data.)

Depositors and authors can expect:

  • An automated email confirming the deposit with some basic information, including the DOI for the dataset.
  • When necessary, a personal email from Research Data Service staff, for example:
    • to report any minor metadata changes that were made such as keyword additions or small spelling corrections
    • to suggest major metadata and/or data file(s) changes
    • to explain concerns related to the dataset
  • Automated reminders about any upcoming publication delays being released.

Research Data Service staff are actively working on tools and guidance to assist researchers with documentation and data cleaning. These features will be implemented with the Illinois Data Bank as they become available. In the meantime, contact us to request a consultation.

Go to Top


What is a DOI?


A A DOI, or digital object identifier, is a unique ID assigned to digital resources on the web and used to identify digital resources on the Web over long periods of time. DOIs also promote resource discovery through the publication of descriptive information about the associated online resources.

Access: DOIs have associated redirect links, usually a page pointing to where the resource can be accessed. Any changes to where the resource is located on the web can be sent to the DOI resolution database and the DOI will automatically begin forwarding to that new page. This means that you are free to use this DOI link within static publications without fear that a resource migration or other URL change may break the link.

Discovery: DataCite, the resolution service that the Illinois Data Bank uses for minting DOIs, also accepts descriptive information about the registered data objects. This metadata is indexed by several search engines and promotes the discovery of the resources for future users.

Publishing your dataset within the Illinois Data Bank grants you a DOI for your dataset so you can benefit from a stable URL, increased visibility, and more formalized citation practices. Research Data Service staff members review and edit your descriptive metadata to ensure accuracy and maximize visibility. Curators also add formal links between associated publications so that aggregation and altmetrics services can recognize and include datasets within their reports.

Questions or concerns about how DOIs work? Contact us.

Go to Top


Metrics


  • Download metrics

    The Illinois Data Bank tracks download counts for datasets. To mitigate possible over- or under-estimation of download counts, a dataset's download counter will increment up by one when one or more any associated files are downloaded or viewed. However, only one download instance will be counted per IP address per calendar day. This means that a single computer downloading a dataset's files multiple times in the same day will only be counted once. IP addresses of downloaders are only used for this purpose and are deleted regularly in compliance with our privacy policy.

  • Other metrics

    Research Data Service team members also collect data on DOI access, individual file downloads, and other citation information about deposits held within the Illinois Data Bank. This information is gathered from a variety of sources on a manual basis and stored outside of the Illinois Data Bank. Depositors are welcome to send in feedback to ask for updated metrics data for their deposits, but please allow several working days for us to collect the information. Some citation information will also be added into dataset metadata areas. Depositors can expect to receive updates about notable citations or changes in access traffic.

Go to Top


Illinois Data Bank Metadata


  • Full documentation, including current and previous versions, about the metadata that the Illinois Data Bank stores and transmits can be found here: https://www.ideals.illinois.edu/handle/2142/91019.

    Metadata describing the datasets housed within the Illinois Data Bank are published in a variety of ways. Individual metadata records may be directly accessed via our internal API. Appending “.json” to a record’s internal URL will provide the Illinois Data Bank’s original metadata. Using “.xml” will provide the version transmitted to DataCite. An index of all available dataset records can be found at https://databank.illinois.edu/datasets.json. External entities are welcome to harvest our metadata records using at least a 2 second access delay. The DataCite API (https://mds.datacite.org/static/apidoc) can also serve individual records for the Illinois Data Bank when provided with our prefix: 10.13012.

Go to Top


File Naming Best Practices


File naming best practice

Examples

Use YYYY-MM-DD format

project01_2019-01-01

Use combination of letters, numbers, underscores, and hyphens

project01_raw-data.json

Use standard file extensions to indicate file type

myproject.txt

Use leading zeroes for version

name001.csv     name010.csv   name101.csv

Keep file name short

not_too_long.xml

Use alphanumericals

data_champaign_il_2019-01-01.csv

Use underscore between words

data_location_time.csv

Use lowercase (some systems are case sensitive)

all_lowercase_would-be-safer.tiff

Go to Top


File Grouping


Uploading lots and lots of files in a single dataset can make it more difficult to curate, upload/download, understand, and reuse your data. A long list of files may overwhelm potential users and long lists of files can also cause technical issues such as slow or failed page loading. Grouping files together may make your dataset easier to deposit and use.

File grouping strategies to consider:

  • Organize your files into logical groups
    (i.e. by time, location, project - whatever would be most helpful to someone reusing your data)

  • Bundle each group into a zip or tar file
    (try to keep each group under 10 GB before you compress it so it's easier to download)

  • Add a note to your data description about the groupings you chose and why

Example 1: Temporal organization by seasons and years Example 2: Publication supporting organization by figures, tables Example 3: Topical organization by species name and treatment types

Other categories to consider include:

  • topical (e.g. by species, treatment, etc)
  • temporal (e.g. by year, month, season, etc)
  • spatial (e.g. by county, state, country, etc)
  • file type (e.g. by CSV, by TXT, code, etc)
  • publication supporting (e.g. by figure, table, etc)
  • workflow (e.g. by inputs, outputs, etc)

We're happy to consult with you on grouping strategies for your dataset, just Contact Us! Or take a look at these good examples in the Illinois Data Bank:

Willson, James; Roddur, Mrinmoy Saha; Baqiao, Liu; Zaharias, Paul; Warnow, Tandy (2021): Data from: "Inferring Species Trees from Gene-Family with Duplication and Loss using Multi-Copy Gene-Family Tree Decomposition". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4050038_V1

Schroeder, Nathan (2020): Burton Endo electron micrograph library. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2692533_V1

Donovan, Brian; Work, Dan (2016): New York City Taxi Trip Data (2010-2013). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/J8PN93H8

Go to Top


Dataset Documentation


We encourage dataset authors to develop documentation that will make it possible for you and others to understand and interpret your dataset in the future. Through the process of depositing in the Illinois Data Bank, you will provide high-level documentation for your dataset, like its title, who created it, etc. It is likely that there are many additional details about your dataset that cannot easily be included as a part of this basic description. To provide customized, detailed information about your dataset, we recommend that you include a documentation file as a part of your dataset deposit.

For more information about how to develop documentation files, please check out these excellent resources:

Readme files

Simple text file that accounts for all files and folders in a dataset

Cornell University. 'Guide to writing 'readme' style metadata.' http://data.research.cornell.edu/content/readme

Codebooks

Contains study-level information and descriptions of each variable/data item

Agency for Healthcare Research and Quality. 'What is a codebook?' https://www.icpsr.umich.edu/icpsrweb/content/shared/ICPSR/faqs/what-is-a-codebook.html

Data Dictionaries

Provides a detailed description for each element or variable in your dataset and data model.

University of Wisconsin Data Services. 'Data Management: Data Dictionaries.' https://www.youtube.com/watch?v=Fe3i9qyqPjo (video)

Go to Top


When Should I Publish My Dataset?



Go to Top


Delaying Publication (Embargoing)


The primary purpose of the Illinois Data Bank is to provide University of Illinois researchers a space to make their research data openly available immediately to anyone in the world. We do recognize that there are sometimes cases where, due to publisher or other requirements, researchers may need to deposit their dataset but make it temporarily unavailable for download. To meet this need, we provide the following two options for temporarily delaying publication of datasets in the Illinois Data Bank:

File Only Publication Delay

Metadata and File Publication Delay

You receive an active DOI.

You will receive a DOI, and the link will forward to the Illinois Data Bank page for your dataset.

Your DOI is saved, but the link will fail.

You will receive a DOI link to place in your publication, but the link will fail until the release date you selected.

Your dataset record is discoverable.

Information for your dataset in the Illinois Data Bank will be publicly visible through several search engines and other sources.

Your dataset record is not discoverable.

Your dataset will be stored in the Illinois Data Bank, but is not discoverable or visible until the release date you selected.

Dataset files cannot be accessed or seen.

Although the record for your dataset is publicly visible, your data files will not be made available until the release date you selected.

Dataset files cannot be accessed or seen.

The record for your dataset is not visible, nor are your data files available until the release date you selected.
The maximum amount of time that data can be delayed for publication is 1 year.

Not sure if you should delay publication of your data, what your release date should be, or have any other questions? Contact us and we will advise you on publication delay based on your specific needs.

Go to Top


Research Data Copyright and Licensing


CC0

CC0 1.0 Universal public domain dedication

CC BY

Creative Commons Attribution 4.0 International license

Other License

A license.txt file must be uploaded as part of dataset.

Derivative works are allowed

Most open license with no restriction.

Attribution a legal requirement

Partially open license with restriction.

Other CC licenses may create reuse difficulties

CC NC, CC SA, and CC ND impose restrictions that may create incompatibilities and licensing difficulties for the reuse of research data.
More restrictions depending on which license is chosen.

Request for attribution

Attribution is not required but creators can ask for it by including citation requests or other attribution information in the documentation of the dataset.

May create reuse difficulties

Known as "attribution stacking", which is an unwieldy accumulation of citations and authors.

Custom licensing considerations

Writing a custom license requires legal expertise and non-standard licenses complicate reuse.

What is copyright?

Copyright is a property right in an original work fixed in any tangible medium of expression giving the holder the exclusive right to reproduce, adapt, distribute, perform, and display the work. 1

What is a license?

A license is a legal instrument for a rights holder to permit a second party to do things that would otherwise infringe on the rights held. 2

How does copyright law apply to research data?

Datasets are complex objects, and understanding how copyright law applies to datasets is similarly complex. Copyright protection does not extend to "facts", and so what researchers often view as their "raw" data are in the public domain. Therefore, dataset authors typically cannot claim copyright for raw data. Copyright can apply to aspects of a dataset for which an author made creative or editorial decisions about how the raw data is expressed. For example, the manner in which data are selected and arranged may be copyrightable. Creators may be able to claim copyright over any visualizations, figures, charts, graphs, and other forms of "processing" of research data as well.

When researchers publish datasets in the Illinois Data Bank the license they assign to their dataset applies only to the copyrightable content of the submission. The raw data and any other part of the dataset submission that is a part of the public domain cannot be licensed.

Why license research data?

  • To mitigate legal uncertainty for downstream users.

  • To control the way you participate in data sharing.

For more information

Ball, Alex. 2014. 'How To License Research Data'. Digital Curation Centre. http://www.dcc.ac.uk/resources/how-guides/license-research-data.

Carroll, Michael W. 2015. 'Sharing Research Data And Intellectual Property Law: A Primer'. PLOS Biology 13 (8): e1002235. doi:10.1371/journal.pbio.1002235.

RDA-CODATA Legal Interoperability Interest Group. 'Legal Interoperability of Research Data: Principles and Implementation Guidelines.' http://www.rd-alliance.org/sites/default/files/attachment/Legal%20Interoperability%20Principles%20and%20Implementation%20Guidelines_Final.pdf.

Urbana Campus of the University of Illinois. 'Illinois Copyright Policy.' http://copyright.illinois.edu/.

University of Illinois. 'The General Rules Concerning University Organization and Procedure,' in particular Articles II and III. https://www.bot.uillinois.edu/governance/general_rules.


1 Black's Law Dictionary 10th Edition, 2009.

2 Ball, Alex. 2014. 'How To License Research Data'. Digital Curation Centre. http://www.dcc.ac.uk/resources/how-guides/license-research-data

Go to Top


Changing, Updating, or Adding Files


You can add and remove files in draft datasets. A draft dataset does not have a DOI, because it not been published or scheduled for publication.

What do you do if you realize you need to correct an error in a dataset you've already published or scheduled for publication? Or what if you have created a helpful documentation file that you want to add to make it easier for others to understand your dataset? Even though the Illinois Data Bank is designed for publication-ready data, we are available to discuss any situation you may be in where you would like to create a new version of your dataset. Please contact us so that we can discuss your situation and needs.

If you expect that your dataset might evolve over time (for example, additional data will be added each year), consider setting, defining, and documenting a "release" cycle. For example, if you plan to deposit rainfall data, appropriate release cycles could include a calendar-based season or annual release, which should be clearly indicated in any documentation files that you deposit along with your data files. To determine the granularity of the release cycle you can consider what span of data is most likely to be needed for reuse, how quickly you'd like to make the data public, and how often you can commit to collect, prepare, and document datasets for deposit. Consider requesting a consultation from the Research Data Service staff to discuss these options.

Go to Top


Accessibility


Research Data Service staff are committed to making the Illinois Data Bank service available to everyone. We are continuously improving the Illinois Data Bank for accessibility but understand that there may be areas and tasks that are not optimized for all access methods.

Research Data Service staff are happy to assist with the data deposit or access process if you encounter technical or navigation trouble. Do not hesitate to contact us by using our help form, emailing databank@library.illinois.edu, or by calling 217-244-1331.

Go to Top


Command Line tools


Overview

These tools can be used to upload files an existing draft dataset in the Illinois Data Bank.

What do we mean by a draft dataset?

A dataset is in a draft state in the Illinois Data Bank after the deposit agreement has been accepted and before the dataset is published or scheduled for publication. Before uploading a file using any of these options, create or find your draft dataset, and navigate to the edit form for that dataset.

How do I get started?

At the bottom of the Files section of any draft dataset is a matrix of upload options buttons.
Click the Get token for command line tools button to display required elements for use in command line tools.

Notes:

OPTIONS: Python, cURL, or custom script

Download our sample python client databank_api_client_v2.py

Requires recent version of python 2 or 3, works on files up to 2 TB.

Required Modules

pip install tuspy
pip install requests
pip install urllib3[secure]"

A version of the following template command, pre-populated with your dataset identifier and token, comes up in response to clicking on the Get token for command line tools button when editing a draft dataset. The only part that would need to change from that example would be the name of your file at the end where that example has myfile.csv and this template has [FILE_TO_UPLOAD].

python databank_api_client_v2.py [DATASET_IDENTIFIER] [TOKEN] [SYSTEM] [FILE_TO_UPLOAD] 

Arguments:

DATASET_IDENTIFIER: a code that uniquely and persistently identifies a dataset within the Illinois Data Bank, obtained on screen opened by Get Token for Command Line Tools button on edit screen for draft dataset
TOKEN: authentication token, obtained on screen opened by Get Token for Command Line Tools button on edit screen for draft dataset
SYSTEM: optional system indicator (local | development | production), default is production
FILE_TO_UPLOAD: name of your datafile to be uploaded

Options: -h --help


This python script and accompanying documentation can be found on GitHub
at https://github.com/medusa-project/databank-client.

For more help, Contact Us.

Requires cURL, works on files up to 4 GB.

Copy and paste this command example. Where it says "binary=@myfile.csv", replace myfile.csv with your data file's name. Keep "binary=@" as it is.

curl -F "binary=@my_datafile.csv" -H "Authorization: Token token=authentication_token" -H "Transfer-Encoding: chunked" -X POST https://databank.illinois.edu/api/dataset/dataset_key/datafile -o output.txt

For more help, see the Command Line Tools / API Reference, or Contact Us.

Go to Top


Using Globus


Globus is a nonprofit platform created by the University of Chicago and Argonne National Laboratory that enables the transfer of digital files between established endpoints, one of which can be your work or personal computer. Globus also offers additional services related to sharing data with other researchers or parties directly.

The datasets in Illinois Data Bank's repository hold files of varying sizes. While files can be downloaded directly from datasets through a browser, Globus can offer a faster alternative that can be especially noticable on datasets larger than a few Gigabytes.

To set up your systems to use Globus to transfer files to your computer, refer to the getting started guide from Globus for detailed guidance on setting up and account and installing Globus Connect Personal.

Once you configure and select an endpoint to send the files to (on your personal computer or other system) you can click on the "Open in Globus File Manager" button on a dataset download page.

Go to Top


For Developers


Databank is the Ruby on Rails web application component of Illinois Data Bank. Within the repository system at the University Library at Illinois, the web application integrates with the Medusa digital preservation repository, also developed at the Univeristy Library at Illinois. Medusa provides long-term retention and accessibility of its digital collection. In development of Medusa, the Library’s goal was to closely integrate the Library’s digital production, preservation, and access services in order to establish an efficient, sensible, and sustainable digital library program. Externally, Illinois Data Bank integrates with the DataCite Metadata Store through DataCite’s EZ API.

When a depositor confirms an intention to publish, the web application requests a DOI, Within a pre-registered prefix, the DataCite EZ API supports generation of a random DOI or a specified DOI. Illinois Data Bank specifies a DOI that is largely opaque, but encodes version information. After DataCite returns a DOI, Databank sends a message to Medusa to initiate ingestion into the digital preservation system. Building on an approach Medusa uses for other functionality, the messages are sent using Advanced Message Queuing Protocol, specifically using a RabbitMQ server. Messaging supports effective integration, while allowing independent development. Custom selection of files for zipping and downloading files in published datasets, which are stored in Medusa, is supported with a distinct web service called Medusa Downloader.

For more information on development of the Illinois Data Bank, please see our Code4Lib paper, view our most recent code in GitHub, or contact us.

Go to Top