Instructions
DataFinisher is part of a pipeline for getting data from an i2b2 query into a standardized tabular form compatible with Excel, SAS, R, and almost any other analysis software. The overall pipeline looks like this:
The KUMC-developed DataBuilder app extracts the visits selected by an i2b2 query along with user-specified data elements as a SQLite database file that is like a miniature version of the i2b2 database containing only the requested patients, visits, and observations. The job of DataFinisher is to 'finish the job' by taking this file and turning it into a plain-text (comma-delimited, tab-delimited, etc.) spreadsheet where each visit is a row sorted by patient number and date, and each i2b2 data element as a column. This is an ordinary spreadsheet with additional metadata embedded in it that DataFinisher uses to let the user change how the data is represented and with what granularity. So you could break out one i2b2 main variable into multiple output columns, transform them into more convenient formats, and filter which observations show up in a given column. You can even create your own custom rules for transforming data.
To use DataFinisher, you can upload either a SQLite db file created by DataBuilder or a spreadsheet previously created by DataFinisher. You can keep coming back and amending how your data is represented at your convenience without having to submit a new i2b2 data-extraction request.
If you do not have a file to try this on, here is some simulated data in .csv format and in SQLite .db format (adapted from http://i2b2.org/). These are demo datasets that contain no PHI, for you to download and then upload back up here so you can try out this app.
A site can also be configured to automatically deposit files into a trusted directory that is not accessible directly from the web but is accessible to DataFinisher. Under such a configuration, the users are sent unique, non-guessable links to their individual files. When a unique link is provided the user does not have to upload anything. Instead DataFinisher takes them directly to their file. Here is an example.
About
Written by Alex F. Bokov, Ph.D. at Clinical Informatics Research Division of the Department of Epidemiology and Biostatistics of UTHealth San Antonio under the mentorship of Ronald Rodriguez, M.D. Ph.D. (UTHealth Department of Urology), Joel Michalek, Ph.D. (UTHealth Department of Epidemiology and Biostatistics) and Shawn N. Murphy, M.D. Ph.D. (Partners Healthcare and Harvard Medical School). This work was made possible by support from:
- Institute for Integration of Medicine and Science
- Long School of Medicine KL2 Award
- NIH/NCATS: UL1TR001120
- PCORI CDRN: 1306-04631 & 1501-26643
The latest version of this open source software is freely available from the repository on GitHub, https://github.com/bokov/datafinisher_webapp/. If/when you deploy DataFinisher at your i2b2 site, it is recommended to run it inside your own firewall, and you definitely should not under any circumstances use this public instance with data that contains HIPAA identifiers (there are no geniuine identifiers in the demo data provided here). We are grateful to our colleagues at Greater Plains Collaborative for developing DataBuilder, the app whose functionality DataFinisher extends.
User Agreement
This WebApp is provided for free as-is without guarantee of suitability for any purpose whatsoever. By uploading a data file to this app you're agreeing to the following: 1) that you have sole responsibility for insuring that you are permitted by law and your institution's policies to process your data through this app; 2) that the author or deployer of this app may track and analyze your usage patterns in order to improve the usability of this app; and 3) that you will hold the author and deployer of this app harmless in the event of any adverse consequences of your use of it.
Instructions
DataFinisher is part of a pipeline for getting data from an i2b2 query into a standardized tabular form compatible with Excel, SAS, R, and almost any other analysis software. The overall pipeline looks like this:
The KUMC-developed DataBuilder app extracts the visits selected by an i2b2 query along with user-specified data elements as a SQLite database file that is like a miniature version of the i2b2 database containing only the requested patients, visits, and observations. The job of DataFinisher is to 'finish the job' by taking this file and turning it into a plain-text (comma-delimited, tab-delimited, etc.) spreadsheet where each visit is a row sorted by patient number and date, and each i2b2 data element as a column. This is an ordinary spreadsheet with additional metadata embedded in it that DataFinisher uses to let the user change how the data is represented and with what granularity. So you could break out one i2b2 main variable into multiple output columns, transform them into more convenient formats, and filter which observations show up in a given column. You can even create your own custom rules for transforming data.
To use DataFinisher, you can upload either a SQLite db file created by DataBuilder or a spreadsheet previously created by DataFinisher. You can keep coming back and amending how your data is represented at your convenience without having to submit a new i2b2 data-extraction request.
If you do not have a file to try this on, here is some simulated data in .csv format and in SQLite .db format (adapted from http://i2b2.org/). These are demo datasets that contain no PHI, for you to download and then upload back up here so you can try out this app.
A site can also be configured to automatically deposit files into a trusted directory that is not accessible directly from the web but is accessible to DataFinisher. Under such a configuration, the users are sent unique, non-guessable links to their individual files. When a unique link is provided the user does not have to upload anything. Instead DataFinisher takes them directly to their file. Here is an example.
About
Written by Alex F. Bokov, Ph.D. at Clinical Informatics Research Division of the Department of Epidemiology and Biostatistics of UTHealth San Antonio under the mentorship of Ronald Rodriguez, M.D. Ph.D. (UTHealth Department of Urology), Joel Michalek, Ph.D. (UTHealth Department of Epidemiology and Biostatistics) and Shawn N. Murphy, M.D. Ph.D. (Partners Healthcare and Harvard Medical School). This work was made possible by support from:
- Institute for Integration of Medicine and Science
- Long School of Medicine KL2 Award
- NIH/NCATS: UL1TR001120
- PCORI CDRN: 1306-04631 & 1501-26643
The latest version of this open source software is freely available from the repository on GitHub, https://github.com/bokov/datafinisher_webapp/. If/when you deploy DataFinisher at your i2b2 site, it is recommended to run it inside your own firewall, and you definitely should not under any circumstances use this public instance with data that contains HIPAA identifiers (there are no geniuine identifiers in the demo data provided here). We are grateful to our colleagues at Greater Plains Collaborative for developing DataBuilder, the app whose functionality DataFinisher extends.