How can we help you?

The Research Data Service has data publishing experts and we're here to help!

A copy of this email will be sent to you automatically.


Getting Started Guide

Welcome to the Illinois Data Bank! This guide will cover the essential steps for publishing your data. You can contact us at any time to get additional help.


Log in with NetID

The first step to depositing data within the Illinois Data Bank is logging in with your Illinois NetID and Active Directory password. The Illinois Data Bank will check your NetID and confirm that you belong to a group eligible for self-deposit. Illinois faculty, graduate students, and most staff are eligible for self-deposit, but others (including undergraduates) will see a restriction notice after logging in. Contact the Research Data Service staff if you run into trouble or need to request authorization for self-deposit.

Describe

The information you provide here will be attached to your dataset as metadata associated with your DOI. Providing detailed descriptive information is important because it will be transmitted to search engines such as Google Scholar and other aggregators that look at data. In short, the better you can describe your data the easier it will be for others to find and make use of it!

In a hurry to deposit? You can always edit and expand your metadata after publication. Research Data Service staff also regularly review the metadata and add information to increase the visibility of your dataset.

There are three sections to add information about your dataset:

  1. Description

  2. Funder

  3. Related Materials

The description area allows you to indicate the authorship information, title, basic information about the contents of your dataset, and an optional release date. The funder information section allows you to provide grant or other information about financial support of your dataset. The related materials area is where you can provide information about scholarly works that use or have contributed to your dataset. Research Data Service staff review this section and create links between these works and your dataset record. This allows publishers and altmetrics tools to make connections to your published data.

Upload

This is when you provide us with a copy of the files you'd like to be published and saved within the Illinois Data Bank. You may either upload directly from your computer or import files you already have stored inside of Box. Importing files from Box is recommended for larger files. Depending on your file size, there are sevearl options for uploading your individual data files:

  • Under 2GB: Use the standard file selection tool, drag-and-drop upload options, or Box import

  • Between 2GB and 15GB: Use the Box import tool

  • Over 15 GB: Contact the RDS for assistance

Unlike the metadata, you will not be allowed to edit or change your data files after publication without curator assistance. We are happy to accommodate updates and changes when necessary, but you will need to contact us if you need to make a change.

Need a bit more time to prepare your data files? No worries. You may save and exit your draft at any time. Let us know if we can help you get your data prepared and documented.

Things you are required to do before publishing:

  • Make sure you have permission from the creator of the dataset to deposit it if you are depositing on behalf of someone else.

  • Remove any sensitive data from each file to be deposited.

Good data behaviors:

  • Include a readme file, data dictionary, and/or codebook as appropriate.

  • Include versions in open formats for long-term preservation.

Unsure how to handle any of these requirements and suggestions? Contact us and a Research Data Service staff member will help guide you through it.

Research Data Service staff regularly review the contents of dataset deposits. You can read more about what to expect as a depositor in our Curation Procedures help section.

Review

This phase of publication gives you a chance to look over all the information you've provided and confirm that everything is ready go to. You have the option of saving your work and exiting from the publication process if you need more time to gather information, confer with your colleagues, ask the Research Data Service staff a question, etc.

Remember that you will not be able to edit your files after publication, so review your file section with some extra care! Research Data Service staff can assist with changes, but you will need to contact us to get help.

Just realized you need some help? Save your draft and let us know that you'd like a dataset consultation. A Research Data Service staff member will look at your draft deposit and offer suggestions.

Publish

Everything ready?

Click on the "confirm" button at the top of the review button to begin the publication process. Once you have published your dataset the metadata will be public and the files immediately available for download if you have not selected a release date for your metadata or files. You may update your metadata after publication but you may not change your data files without the help of an Research Data Service staff member. You and all authors will receive a confirmation email with your DOI and other information about your dataset.

Go to Top


Curation Procedures


Curious about what happens to your data after you deposit it? Research Data Service staff monitor deposits and perform several levels of curation checks. The depth of the dataset review will depend on the size of the dataset deposited, how well documented it is, and general staff availability. These reviews are not peer review and do not judge the core scientific analysis, methodologies, or conclusions behind the data. Instead, the purpose of review is to ensure metadata completeness and dataset discoverability.

An initial scan of the dataset should occur within one week of the initial deposit. The curator reviews the metadata provided and basic file information, which may include the following activities:

  • Metadata formatting and light copyediting
  • Adding metadata links between related resources, such as the paper a dataset supports
  • Identification of areas in need of major additions or changes within the metadata and/or files

In addition to review for metadata completeness and dataset discoverability, as can be detected, staff may also note:

  • Deposits that appear spurious
  • Metadata or data files that appear to potentially include erroneous or sensitive content (The Illinois Data Bank Withdrawal Guidelines has more information on how curators handle datasets with errors or sensitive data.)

Depositors and authors can expect:

  • An automated email confirming the deposit with some basic information, including the DOI for the dataset.
  • When necessary, a personal email from Research Data Service staff, for example:
    • to report any minor metadata changes that were made such as keyword additions or small spelling corrections
    • to suggest major metadata changes
    • to explain concerns related to the dataset
  • Automated reminders about any upcoming publication delays being released.

Research Data Service staff are actively working on tools and guidance to assist researchers with documentation and data cleaning. These features will be implemented with the Illinois Data Bank as they become available. In the meantime, contact us to request a consultation.

Go to Top


What is a DOI?


A DOI, or digital object identifier, is a unique ID assigned to digital resources on the web and used to identify digital resources on the Web over long periods of time. DOIs also promote resource discovery through the publication of descriptive information about the associated online resources.

Access: DOIs have associated redirect links, usually a page pointing to where the resource can be accessed. Any changes to where the resource is located on the web can be sent to the DOI resolution database and the DOI will automatically begin forwarding to that new page. This means that you are free to use this DOI link within static publications without fear that a resource migration or other URL change may break the link.

Discovery: DataCite, the resolution service that the Illinois Data Bank uses for minting DOIs, also accepts descriptive information about the registered data objects. This metadata is indexed by several search engines and promotes the discovery of the resources for future users.

Publishing your dataset within the Illinois Data Bank grants you a DOI for your dataset so you can benefit from a stable URL, increased visibility, and more formalized citation practices. Research Data Service staff members review and edit your descriptive metadata to ensure accuracy and maximize visibility. Curators also add formal links between associated publications so that aggregation and altmetrics services can recognize and include datasets within their reports.

Questions or concerns about how DOIs work? Contact us.

Go to Top


Metrics


  • Download metrics

    The Illinois Data Bank tracks download counts for datasets. To mitigate possible over- or under-estimation of download counts, a dataset's download counter will increment up by one when one or more any associated files are downloaded. However, only one download instance will be counted per IP address per calendar day. This means that a single computer downloading a dataset's files multiple times in the same day will only counted once. IP addresses of downloaders are only used for this purpose and are deleted regularly in compliance with our privacy policy.

  • Other metrics

    Research Data Service team members also collect data on DOI access, individual file downloads, and other citation information about deposits held within the Illinois Data Bank. This information is gathered from a variety of sources on a manual basis and stored outside of the Illinois Data Bank. Depositors are welcome to send in feedback to ask for updated metrics data for their deposits, but please allow several working days for us to collect the information. Some citation information will also be added into dataset metadata areas. Depositors can expect to receive updates about notable citations or changes in access traffic.

Go to Top


Illinois Data Bank Metadata


  • Full documentation, including current and previous versions, about the metadata that the Illinois Data Bank stores and transmits can be found here: https://www.ideals.illinois.edu/handle/2142/91019.

    Metadata describing the datasets housed within the Illinois Data Bank are published in a variety of ways. Individual metadata records may be directly accessed via our internal API. Appending “.json” to a record’s internal URL will provide the Illinois Data Bank’s original metadata. Using “.xml” will provide the version transmitted to DataCite. An index of all available dataset records can be found at https://databank.illinois.edu/datasets.json. External entities are welcome to harvest our metadata records using at least a 2 second access delay. The DataCite API (https://mds.datacite.org/static/apidoc) can also serve individual records for the Illinois Data Bank when provided with our prefix: 10.13012.

Go to Top


Dataset Documentation


We encourage dataset authors to develop documentation that will make it possible for you and others to understand and interpret your dataset in the future. Through the process of depositing in the Illinois Data Bank, you will provide high-level documentation for your dataset, like its title, who created it, etc. It is likely that there are many additional details about your dataset that cannot easily be included as a part of this basic description. To provide customized, detailed information about your dataset, we recommend that you include a documentation file as a part of your dataset deposit.

For more information about how to develop documentation files, please check out these excellent resources:

Readme files

Simple text file that accounts for all files and folders in a dataset

Cornell University. 'Guide to writing 'readme' style metadata.' http://data.research.cornell.edu/content/readme

Codebooks

Contains study-level information and descriptions of each variable/data item

Agency for Healthcare Research and Quality. 'What is a codebook?' http://www.icpsr.umich.edu/icpsrweb/AHRQMCC/support/faqs/2006/01/what-is-codebook

Data Dictionaries

Provides a detailed description for each element or variable in your dataset and data model.

University of Wisconsin Data Services. 'Data Management: Data Dictionaries.' https://www.youtube.com/watch?v=Fe3i9qyqPjo (video)

Go to Top


When Should I Publish My Dataset?


  • We normally recommend deposit of datasets associated with a publication in the window after peer review (in case reviewers suggest revisions that would effect the data), but before final publication so the DOI for the dataset can be provided and published along with the article text. Research Data Service staff are happy to work through other scenarios with you if you would like to request a consultation.

Go to Top


Delaying Publication (Embargoing)


The primary purpose of the Illinois Data Bank is to provide University of Illinois researchers a space to make their research data openly available immediately to anyone in the world. We do recognize that there are sometimes cases where, due to publisher or other requirements, researchers may need to deposit their dataset but make it temporarily unavailable for download. To meet this need, we provide the following two options for temporarily delaying publication of datasets in the Illinois Data Bank:

File Only Publication Delay

Metadata and File Publication Delay

You receive an active DOI.

You will receive a DOI, and the link will forward to the Illinois Data Bank page for your dataset.

Your DOI is saved, but the link will fail.

You will receive a DOI link to place in your publication, but the link will fail until the release date you selected.

Your dataset record is discoverable.

Information for your dataset in the Illinois Data Bank will be publicly visible through several search engines and other sources.

Your dataset record is not discoverable.

Your dataset will be stored in the Illinois Data Bank, but is not discoverable or visible until the release date you selected.

Dataset files cannot be accessed or seen.

Although the record for your dataset is publicly visible, your data files will not be made available until the release date you selected.

Dataset files cannot be accessed or seen.

The record for your dataset is not visible, nor are your data files available until the release date you selected.
The maximum amount of time that data can be delayed for publication is 1 year.

Not sure if you should delay publication of your data, what your release date should be, or have any other questions? Contact us and we will advise you on publication delay based on your specific needs.

Go to Top


Research Data Copyright and Licensing


CC0

CC0 1.0 Universal public domain dedication

CC BY

Creative Commons Attribution 4.0 International license

Other License

A license.txt file must be uploaded as part of dataset.

Best for reuse

Lets others distribute, remix, tweak, and build upon your work without any restrictions or requirements.

Attribution a legal requirement

Requires that others attribute you for any reuse of your data in perpetuity.

Other CC licenses may create reuse difficulties

CC NC, CC SA, and CC ND impose restrictions that may create incompatibilities and licensing difficulties for the reuse of research data.

You can still request attribution

Doesn't let others ignore community citation practices; just doesn't legally require attribution for reuse. You can include a request for citation or other attribution information in the readme.txt file of your dataset.

May create reuse difficulties

Can result in unwieldy accumulation of citations and authors, known as "attribution stacking".

Custom licensing considerations

Writing a custom license requires legal expertise and non-standard licenses complicate reuse.

What is copyright?

Copyright is a property right in an original work fixed in any tangible medium of expression giving the holder the exclusive right to reproduce, adapt, distribute, perform, and display the work.1

What is a license?

A license is a legal instrument for a rights holder to permit a second party to do things that would otherwise infringe on the rights held.2

How does copyright law apply to research data?

Datasets are complex objects, and understanding how copyright law applies to datasets is similarly complex. Copyright protection does not extend to "facts", and so what researchers often view as their "raw" data are in the public domain. Therefore, dataset authors typically cannot claim copyright for raw data. Copyright can apply to aspects of a dataset for which an author made creative or editorial decisions about how the raw data is expressed. For example, the manner in which data are selected and arranged may be copyrightable. Creators may be able to claim copyright over any visualizations, figures, charts, graphs, and other forms of "processing" of research data as well.

When researchers publish datasets in the Illinois Data Bank the license they assign to their dataset applies only to the copyrightable content of the submission. The raw data and any other part of the dataset submission that is a part of the public domain cannot be licensed.

Why license research data?

  • To mitigate legal uncertainty for downstream users.

  • To control the way you participate in data sharing.

For more information

Ball, Alex. 2014. 'How To License Research Data'. Digital Curation Centre. http://www.dcc.ac.uk/resources/how-guides/license-research-data.

Carroll, Michael W. 2015. 'Sharing Research Data And Intellectual Property Law: A Primer'. PLOS Biology 13 (8): e1002235. doi:10.1371/journal.pbio.1002235.

RDA-CODATA Legal Interoperability Interest Group. 'Legal Interoperability of Research Data: Principles and Implementation Guidelines.' http://www.rd-alliance.org/sites/default/files/attachment/Legal%20Interoperability%20Principles%20and%20Implementation%20Guidelines_Final.pdf.

Urbana Campus of the University of Illinois. 'Illinois Copyright Policy.' http://copyright.illinois.edu/.

University of Illinois. 'The General Rules Concerning University Organization and Procedure,' in particular Articles II and III. http://www.bot.uillinois.edu/general-rules.


1 Black's Law Dictionary 10th Edition, 2009.

2 Ball, Alex. 2014. 'How To License Research Data'. Digital Curation Centre. http://www.dcc.ac.uk/resources/how-guides/license-research-data

Go to Top


Changing, Updating, or Adding Files


You can add and remove files in draft datasets. A draft dataset does not have a DOI, because it not been published or scheduled for publication.

What do you do if you realize you need to correct an error in a dataset you've already published or scheduled for publication? Or what if you have created a helpful documentation file that you want to add to make it easier for others to understand your dataset? Even though the Illinois Data Bank is designed for publication-ready data, we are available to discuss any situation you may be in where you would like to create a new version of your dataset. Please contact us so that we can discuss your situation and needs.

If you expect that your dataset might evolve over time (for example, additional data will be added each year), consider setting, defining, and documenting a "release" cycle. For example, if you plan to deposit rainfall data, appropriate release cycles could include a calendar-based season or annual release, which should be clearly indicated in any documentation files that you deposit along with your data files. To determine the granularity of the release cycle you can consider what span of data is most likely to be needed for reuse, how quickly you'd like to make the data public, and how often you can commit to collect, prepare, and document datasets for deposit. Consider requesting a consultation from the Research Data Service staff to discuss these options.

Go to Top


Accessibility


Research Data Service staff are committed to making the Illinois Data Bank service available to everyone. We are continuously improving the Illinois Data Bank for accessibility but understand that there may be areas and tasks that are not optimized for all access methods.

Research Data Service staff are happy to assist with the data deposit or access process if you encounter technical or navigation trouble. Do not hesitate to contact us by using our help form, emailing databank@library.illinois.edu, or by calling 217-244-1331.

Go to Top

Command Line tools

Overview

These tools can be used to upload files an existing draft dataset in the Illinois Data Bank.

What do we mean by a draft dataset?

A dataset is in a draft state in the Illinois Data Bank after the deposit agreement has been accepted and before the dataset is published or scheduled for publication. Before uploading a file using any of these options, create or find your draft dataset, and navigate to the edit form for that dataset.

How do I get started?

At the bottom of the Files section of any draft dataset is a matrix of upload options buttons.
Click the Get token for command line tools button to display required elements for use in command line tools.

Notes:

  • A token expires in 3 days, but a new one can be requested using the same method.

  • Anyone can use a token to upload a file to this dataset, so keep it secure.

  • A distinct token is required for each dataset.


OPTIONS: Python, cURL, or custom script

Download our sample python client illinois_data_bank_datafile.py

Requires recent version of python 2 or 3, works on files up to 2 TB.

Required Modules

pip install docopt
pip install requests
pip install urllib3[secure]"

A version of the following template command, pre-populated with your dataset identifier and token, comes up in response to clicking on the Get token for command line tools button when editing a draft dataset. The only part that would need to change from that example would be the name of your file at the end where that example has myfile.csv and this template has [FILE_TO_UPLOAD].

python illinois_data_bank_datafile.py [DATASET_IDENTIFIER] [TOKEN] [SYSTEM] [FILE_TO_UPLOAD]

Arguments:

DATASET_IDENTIFIER: a code that uniquely and persistently identifies a dataset within the Illinois Data Bank, obtained on screen opened by Get Token for Command Line Tools button on edit screen for draft dataset
TOKEN: authentication token, obtained on screen opened by Get Token for Command Line Tools button on edit screen for draft dataset
SYSTEM: optional system indicator (local | development | production), default is production
FILE_TO_UPLOAD: name of your datafile to be uploaded

Options: -h --help


This python script and accompanying documentation can be found on GitHub
at https://github.com/medusa-project/databank-client.

For more help, Contact Us.

Requires cURL, works on files up to 4 GB.

A version of the following example command, pre-populated with your dataset identifier and token, comes up in response to clicking on the Get token for command line tools button when editing a draft dataset. The only part that would need to change from that example is the name of your file in the "binary = @my_datafile.csv" section. The at symbol (@) is required just before the file name.

curl -F "binary=@my_datafile.csv" -H "Authorization: Token token=[TOKEN]" -H "Transfer-Encoding: chunked" -X POST https://databank.illinois.edu/api/dataset/[DATASET_IDENTIFIER]/datafile -o output.txt

The basic endpoint URL pattern is https://databank.illinois.edu/api/dataset/[DATASET_IDENTIFIER]/datafile

The request method is POST.

The authorization token must be sent in a header.

A header setting the Transfer-Encoding to chunked is recommended.

The file must be sent in a form in an element named binary. In cURL, that can be done with the -F option and an element like "binary=@my_datafile.csv".

The -o option must be used to send response output to a file to see the progress meter.

Even after the cURL progress meter reaches 100%, additional processing is done, which may take as long as it took to reach 100%.

After upload is complete, refresh the dataset page to see the new datafile listing.

For more help, Contact Us.

Size constraints depend on implementation details.

Simple Protocol

The simple one-call protocol supports files up to 4GB

The curl example above uses the simple protocol.

The basic endpoint URL pattern is https://databank.illinois.edu/api/dataset/[DATASET_IDENTIFIER]/datafile

The request method is POST.

The authorization token must be sent in a header.

A header setting the Transfer-Encoding to chunked is recommended.

The file must be sent in a form in an element named binary.

After upload is complete, refresh the dataset page to see the new datafile listing.

Complex Protocol

The complex protocol, used by the sample python client above, requires several coordinated calls, but supports files up to 2TB in size, to an existing draft dataset with an authentication token and dataset key as described above.

An example of using the complex protocol in a python script can be found on GitHub at https://github.com/medusa-project/databank-client.

Detailed documenatation is available in Illinois_Data_Bank_upload.pdf.

For more help, Contact Us.

Go to Top


For Developers


Databank is the Ruby on Rails web application component of Illinois Data Bank. Within the repository system at the Univeristy Library at Illinois, the web application integrates with the Medusa digital preservation repository, also developed at the Univeristy Library at Illinois. Medusa provides long-term retention and accessibility of its digital collection. In development of Medusa, the Library’s goal was to closely integrate the Library’s digital production, preservation, and access services in order to establish an efficient, sensible, and sustainable digital library program. Externally, Illinois Data Bank integrates with the DataCite Metadata Store through Purdue’s EZID service.

When a depositor confirms an intention to publish, the web application requests a DOI, Within a pre-registered prefix, the EZID API supports generation of a random DOI or a specified DOI. Illinois Data Bank specifies a DOI that is largely opaque, but encodes version information. After EZID returns a DataCite DOI, Databank sends a message to Medusa to initiate ingestion into the digital preservation system. Building on an approach Medusa uses for other functionality, the messages are sent using Advanced Message Queuing Protocol, specifically using a RabbitMQ server. Messaging supports effective integration, while allowing independent development. Custom selection of files for zipping and downloading files in published datasets, which are stored in Medusa, is supported with a distinct web service called Medusa Downloader.

For more information on development of the Illinois Data Bank, please see our Code4Lib paper, view our most recent code in GitHub, or contact us.

Go to Top