{"ID":78057,"post_author":"9208550","post_date":"2018-12-14 14:10:52","post_date_gmt":"0000-00-00 00:00:00","post_content":"","post_title":"LIMSjournal - Spring 2017","post_excerpt":"","post_status":"draft","comment_status":"closed","ping_status":"closed","post_password":"","post_name":"","to_ping":"","pinged":"","post_modified":"2018-12-14 14:10:52","post_modified_gmt":"2018-12-14 19:10:52","post_content_filtered":"","post_parent":0,"guid":"https:\/\/www.limsforum.com\/?post_type=ebook&p=78057","menu_order":0,"post_type":"ebook","post_mime_type":"","comment_count":"0","filter":"","_ebook_metadata":{"enabled":"on","private":"0","guid":"11E35BB7-429F-499D-94D7-A3DFBF78CA5E","title":"LIMSjournal - Spring 2017","subtitle":"Volume 3, Issue 1","cover_theme":"nico_7","cover_image":"https:\/\/www.limsforum.com\/wp-content\/plugins\/rdp-ebook-builder\/pl\/cover.php?cover_style=nico_7&subtitle=Volume+3%2C+Issue+1&editor=Shawn+Douglas&title=LIMSjournal+-+Spring+2017&title_image=https%3A%2F%2Fs3.limsforum.com%2Fwww.limsforum.com%2Fwp-content%2Fuploads%2FFig1_Boland_PLOSCompBio2017_13-1.png&publisher=LabLynx+Press","editor":"Shawn Douglas","publisher":"LabLynx Press","author_id":"26","image_url":"","items":{"35171859a8e80fe1a0d916059f4fdd3e_type":"article","35171859a8e80fe1a0d916059f4fdd3e_title":"The effect of the General Data Protection Regulation on medical research (Rumbold and Pierscionek 2017)","35171859a8e80fe1a0d916059f4fdd3e_url":"https:\/\/www.limswiki.org\/index.php\/Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research","35171859a8e80fe1a0d916059f4fdd3e_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:The effect of the General Data Protection Regulation on medical research\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nThe effect of the General Data Protection Regulation on medical researchJournal\n \nJournal of Medical Internet ResearchAuthor(s)\n \nRumbold, John Mark Michael; Pierscionek, BarbaraAuthor affiliation(s)\n \nKingston University London, Nottingham Trent UniversityPrimary contact\n \nEmail: J dot Rumbold [at] Kingston dot ac dot ukEditors\n \nEysenbach, G.Year published\n \n2017Volume and issue\n \n19 (2)Page(s)\n \ne47DOI\n \n10.2196\/jmir.7108ISSN\n \n1438-8871Distribution license\n \nCreative Commons Attribution 2.0Website\n \nhttp:\/\/www.jmir.org\/2017\/2\/e47\/Download\n \nhttp:\/\/www.jmir.org\/2017\/2\/e47\/pdf (PDF)\n\nContents\n\n1 Abstract \n2 Overview \n3 The Data Protection Directive \n4 The law as it will be from 2018: The General Data Protection Regulation \n5 Anonymization \n6 Consent \n7 Data sharing \n8 Conclusions \n9 Acknowledgements \n10 Authors' contributions \n11 Conflicts of interest \n12 Abbreviations \n13 References \n14 Notes \n\n\n\nAbstract \nBackground: The enactment of the General Data Protection Regulation (GDPR) will impact on European data science. Particular concerns relating to consent requirements that would severely restrict medical data research have been raised.\nObjective: Our objective is to explain the changes in data protection laws that apply to medical research and to discuss their potential impact.\nMethods: Analysis of ethicolegal requirements imposed by the GDPR\nResults: The GDPR makes the classification of pseudonymised data as personal data clearer, although it has not been entirely resolved. Biomedical research on personal data where consent has not been obtained must be of substantial public interest.\nConclusions: The GDPR introduces protections for data subjects that aim for consistency across the E.U. The proposed changes will make little impact on biomedical data research.\nKeywords: pseudonymity, anonymity, untraceability, privacy-preserving protocols, informatics, data reporting, data protection, research ethics\n\nOverview \nThere have been significant developments in European Union (E.U.) data protection law recently that will have an impact on health care professionals, particularly those engaged in research and audit. The General Data Protection Regulation (GDPR) has replaced the current legislation and comes into full effect in 2018.[1] The implications for the handling of health care data of the GDPR will be discussed in this paper. Despite the recent referendum vote in the United Kingdom to leave the E.U., the GDPR will continue to be relevant to the United Kingdom, whether this is due to cooperation in European projects or because the United Kingdom continues to be a member of the European Economic Area (EEA).\n\nThe Data Protection Directive \nCurrently the relevant law in the United Kingdom is the Data Protection Act 1998, which is the United Kingdom\u2019s transposition of the Data Protection Directive (DPD). European directives are not directly enforceable, requiring member states to pass legislation to comply with their requirements. There are derogations (legal exemptions) for research, which in the case of the United Kingdom have been criticized for being too broad. The LRDP Kantor report for the European Commission criticizes the United Kingdom for disregard of the limitations, stating that the Data Protection Act blatantly violates the Directive by adding \"medical research\" to the list of medical purposes.[2] The DPD requires a \"substantial public interest\" for member states to add to the derogations for processing of sensitive personal data (Article 8.4).\nDifferences between E.U. member states can result in research ethics committees in United Kingdom denying permission for National Health Service (NHS) data to be transferred to other E.U. countries (the opposite might also be the case in some circumstances).[3] These differences have also contributed to the passage of the GDPR as part of the Digital Single Market strategy.[4]\n\nThe law as it will be from 2018: The General Data Protection Regulation \nThe text of the GDPR has recently been agreed after a prolonged trilogue between the European Commission, Parliament, and the Council of Ministers.[5] This legislation will replace the national transpositions of the DPD. Regulations are directly enforceable across the E.U. The GDPR comes into full effect on May 25, 2018, although member states are permitted minor differences in interpretation (the European Court of Justice is the ultimate arbiter). This legislation has the potential to affect projects using research data banks and Big Data.[6][7] There had been concerns that a clause inserted by the European Parliament requiring specific consent would prevent significant long-term epidemiological research taking place in the future[8], but this was rejected and the agreed text permits broad consent to \"certain areas of research when in keeping with recognized ethical standards\" (Recital 33).[9] Broad consent is not blanket or open consent[10] although some commentators argue that blanket or open consent is acceptable for biobank and databank research as the risks are minimal and do not vary for different projects.[11] Another possibility is consent to a form of governance.[12] Open consent without any ongoing regulation or communication about proposed projects would be potentially problematic. Dynamic consent offers advantages for an engaged community of participants but might not be considered beneficial by some individuals.[13]\nThe derogations for research without consent have been expanded to specifically include medical research where \"in the public interest\" (Recital 51). How public interest will be defined has not been elaborated, but European jurisprudence demands member states satisfy a high threshold where human rights are involved (e.g., a \"pressing social need\"[14]). This standard would not be required for the conduct of medical research using databanks, but it might exclude all commercial research for \"me too\" drug development (drugs that offer no advantages over drugs already on the market), arrangements that have no evidence of benefit sharing, or simply require that projects address issues of public importance regardless of the profits made.[15] This requirement reflects public attitudes in the United Kingdom to the use of health care data, where there is resistance to use of public data for commercial ventures unless the research could not happen without commercial involvement.[16][17]\n\nAnonymization \nData protection law only applies to personal data \u2014 that is, data that does directly or can indirectly identify an individual.[18][19][20] The simple deletion of name and address is usually insufficient to constitute anonymization (it has been demonstrated that the combination of three pieces of data could identify 87 percent of U.S. residents: 5-digit zip code, birth date, and sex).[21] The United Kingdom Information Commissioner\u2019s Office currently treats pseudonymized data as anonymous where it is used by a third party who does not possess the requisite key code. Truly anonymized data cannot be linked back to an individual (which means that verification of data is not possible by any means). Pseudonymized data typically has identifiers removed and replaced with a unique key code (there is also two-way cryptography; one-way cryptography is considered anonymized). This key code can be used to trace the data back to an individual, enabling any safety concerns to be acted upon and for data to be verified. This is the approach that the United Kingdom's care.data project on the use of NHS electronic health records for data research has been taking.[22] The GDPR will require changes in practice, as it confirms in Recital 26 that pseudonymized data must be treated as personal data (in line with the previous Article 29 Working Party opinion).[18] That position results from the increased vulnerability of data subjects who could potentially be identified compared to the protection afforded them with true anonymisation \u2014 if the key code is hacked, then all the data can be linked to an individual once more.\n\nConsent \nConsent presumed by failure to opt-out, or change preticked boxes, will no longer be permitted (unless covered by the derogations) \u2014 consent will need to be by a \"clear, affirmative action\" (Article 4.11). These changes would have arguably made the abandoned care.data project[23] illegal, despite the passage of enabling legislation that exempted general practitioners from the common law duty of confidentiality when fulfilling their contractual duties to pass on health care data. The care.data program relied on an opt-out for legitimacy.[22] The exercise of this opt-out was not straightforward. The numbers opting out far exceeded the estimates and the capacity for the Health and Social Care Information Centre (now NHS Digital) to process in a timely manner. The problems included omission of those who opted out from calls for NHS screening programs, even though this was not the intention of those exercising this right. NHS Digital currently relies on pseudonymization, which the GDPR states is categorized as a matter of law as personal data. It is not entirely clear whether or not third parties without access to the key code could treat pseudonymized data as anonymized (as is currently the case in the United Kingdom). Key codes are a potential vulnerability due to accidental or malicious disclosure, which is one of the justifications for pseudonymized data being classified as personal data. There are no clear indications that there are no future plans to use NHS patient data for research.\nDame Fiona Caldicott reviewed arrangements because of the widespread concerns related to consent[22], and her report led to the cancellation of the Care.data project.[23] The particular issues that were identified include the lack of information about care.data that made exercising an opt-out an opaque process, the inadequate mechanisms for opting, and the failure of protection for rights and access to the NHS for those who opt out.\nThe risk of re-identification in the future is impossible to quantify precisely because it cannot be predicted what information will become public.[24] However, as with biobanks, the risks to individuals are lesser compared with studies of medical interventions.[25] Therefore authorization by research ethics committees is acceptable practice, with the requirement that opt-outs be respected unless there are exceptional circumstances.\nAlthough the GDPR comes into force in mid-2018, researchers need to prepare now for the changes it will bring to long-term epidemiological studies. In particular, the categorization of pseudonymized data as personal will require action in some jurisdictions such as the United Kingdom and Greece.[26] The necessary accommodations will require an investment of resources, but this will hopefully ensure that subjects continue to have trust in the integrity of their health care data and the medical research community.[27] The GDPR may still apply should the United Kingdom cease to become a member state of the E.U. either because the United Kingdom is a member of the EEA or because the United Kingdom retains these instruments as law at least for the short term.[28]\nAlthough audit and research are treated differently in law, the boundaries between the two activities are blurred.[29] Audit is directly relevant to the monitoring and improvement of quality of health care; therefore, it is included as a primary use of data\u2014Recitals 52-54 and Article 9.2 (h) and (i) of the GDPR make this clear. Audit and health care management are a primary use of health care data, and research is a secondary use \u2014 that is, it is a use different from the originally declared purpose (although it is designated a compatible purpose within the GDPR but only for nonsensitive data). If an audit compares health care systems to discover which is most effective, this can also be categorized as research as the practices are not compared to a gold standard, and there is a hypothesis being generated or even tested by finding associations. The recent furor surrounding the Royal Free Trust project in conjunction with Google DeepMind illustrates the debate over the distinction of audit from research.[30][31][32]\n\nData sharing \nDame Fiona Caldicott affirmed in her 2013 report on information governance that \"The duty to share can be as important as the duty to protect patient confidentiality.\"[33] Data sharing within the E.U. should not be obstructed because of differences in data protection law under the principles of the Digital Single Market and Article 1(2) of the Data Protection Directive. Data portability and data sharing is an issue with health care data[34], which the European Patients Smart Open Services (epSOS) project attempted to address.[35] The GDPR addresses data portability under Article 20, stating that the data subject has the right to receive their data in an appropriate format without hindrance and for data to be transferred between data controllers where technically feasible. The Bundestag is currently considering an eHealth bill with the same aim of improving portability of data.[36] This will facilitate the ability of patients to move between health care providers without unnecessary duplication of tests.\n\nConclusions \nThe Digital Single Market aims for improved data sharing across the E.U., which will facilitate cross-border health care and research. Harmonization will be improved under the GDPR with a concomitant raising of standards for some countries, although there is still room for national differences according to the reasonable expectations of different publics. This advance makes cross-border projects more easily ethically justifiable and more feasible.[37] The requirements for anonymization have not been changed, except to clarify that pseudonymized data must still be considered as personal data. The GDPR will facilitate medical research, except where it is research not considered in the public interest. In that case, more demanding requirements for anonymization will entail either true anonymization or consent. It is likely there will be more projects that require either consent or authorization, since many projects currently use pseudonymization. There is still an unresolved issue over third parties with access to pseudonymized data.\n\nAcknowledgements \nThis work has been funded by AEGLE project, Horizon 2020 ICT\/2014\/1 grant.\n\nAuthors' contributions \nBoth authors contributed to the analysis of legal issues and the writing of the manuscript.\n\nConflicts of interest \nNone declared.\n\nAbbreviations \nDPD: Data Protection Directive\nEEA: European Economic Area\nepSOS: European Patients Smart Open Services\nE.U.: European Union\nGDPR: General Data Protection Regulation\nNHS: National Health Service\n\nReferences \n\n\n\u2191 \"EUR-Lex - 32016R0679 - EN\". EUR-Lex. European Union. 27 April 2016. http:\/\/eur-lex.europa.eu\/eli\/reg\/2016\/679\/oj . Retrieved 04 February 2017 .   \n\n\u2191 LRDP Kantor (20 January 2010). \"Comparative study on different approaches to new privacy challenges in particular in the light of technology developments\" (PDF). European Commission. http:\/\/ec.europa.eu\/justice\/policies\/privacy\/docs\/studies\/new_privacy_challenges\/final_report_en.pdf . Retrieved 04 February 2017 .   \n\n\u2191 Veerus, P.; Lexchin, J.; Hemminki, E. (2014). \"Legislative regulation and ethical governance of medical research in different European Union countries\". Journal of Medical Ethics 40 (6): 409-413. doi:10.1136\/medethics-2012-101282.   \n\n\u2191 DG Justice (18 January 2016). \"Reform of EU data protection rules\". European Commission. http:\/\/ec.europa.eu\/justice\/data-protection\/reform\/index_en.htm . Retrieved 04 February 2017 .   \n\n\u2191 Ansip, A. (06 May 2015). \"Statement by Vice-President Andrus Ansip at the press conference on the adoption of the Digital Single Market Strategy\". European Commission. http:\/\/europa.eu\/rapid\/press-release_SPEECH-15-4926_en.htm . Retrieved 04 February 2017 .   \n\n\u2191 Marr, B. (09 April 2015). \"The 5 V's of Big Data by Bernard Marr\". Data Science Central. http:\/\/www.datasciencecentral.com\/profiles\/blogs\/the-5-v-s-of-big-data-by-bernard-marr . Retrieved 04 February 2017 .   \n\n\u2191 Thompson, B. (July 2016). \"Analysis: Research and the General Data Protection Regulation - 2012\/0011(COD)\" (PDF). Wellcome Trust. https:\/\/wellcome.ac.uk\/sites\/default\/files\/new-data-protection-regulation-key-clauses-wellcome-jul16.pdf . Retrieved 04 February 2017 .   \n\n\u2191 Stevens, L. (2015). \"The Proposed Data Protection Regulation and Its Potential Impact on Social Sciences Research in the UK\". European Data Protection Law Review 1 (2): 97\u2013112. doi:10.21552\/EDPL\/2015\/2\/4.   \n\n\u2191 Simon, C.M.; L'heureux, J.; Murray, J.C. (2011). \"Active choice but not too active: Public perspectives on biobank consent models\". Genetics in Medicine 13 (9): 821\u201331. doi:10.1097\/GIM.0b013e31821d2f88. PMC PMC3658114. PMID 21555942. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3658114 .   \n\n\u2191 Hofmann, B. (2009). \"Broadening consent\u2014and diluting ethics?\". Journal of Medical Ethics 35 (2): 125\u2013129. doi:10.1136\/jme.2008.024851.   \n\n\u2191 Sheehan, M. (2011). \"Can broad consent be informed consent?\". Public Health Ethics 4 (3): 226\u2013235. doi:10.1093\/phe\/phr020.   \n\n\u2191 Laurie, G. (2013). \"Governing the spaces in-between: Law and legitimacy in new health technologies\". In Flear, M.L.; Farrell, A.; Hervey, T.K.; Murphy, T.. European Law and New Health Technologies. Oxford University Press. pp. 193. ISBN 9780199659210.   \n\n\u2191 Steinsbekk, K.S.; K\u00e5re Myskja, B.; Solberg, B. (2013). \"Broad consent versus dynamic consent in biobank research: is passive participation an ethical problem?\". European Journal of Human Genetics 21 (9): 897-902. doi:10.1038\/ejhg.2012.282. PMC PMC3746258. PMID 23299918. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3746258 .   \n\n\u2191 \"Case of Handyside v. The United Kingdom\". European Court of Human Rights. 07 December 1976. http:\/\/hudoc.echr.coe.int\/eng?i=001-57499 . Retrieved 15 February 2017 .   \n\n\u2191 Haddow, G.; Laurie, G.; Cunningham-Burley, S.; Hunter, K.G. (2007). \"Tackling community concerns about commercialisation and genetic research: A modest interdisciplinary proposal\". Social Science & Medicine 64 (2): 272\u201382. doi:10.1016\/j.socscimed.2006.08.028. PMID 17050056.   \n\n\u2191 Aiken, M. (03 August 2011). \"SHIP Public Engagement: Summary of Focus Group Findings\". Wellcome Trust. http:\/\/www.academia.edu\/2702142\/SHIP_Public_Engagement_Summary_of_Focus_Groups . Retrieved 04 February 2017 .   \n\n\u2191 Ipsos MORI (09 March 2016). \"The One-Way Mirror: Public attitudes to commercial access to health data\". pp. 154. https:\/\/www.ipsos-mori.com\/researchpublications\/publications\/1803\/Commercial-access-to-health-data.aspx . Retrieved 04 February 2017 .   \n\n\u2191 18.0 18.1 Data Protection Working Party (April 2007). \"Opinion 4\/2007 on the concept of personal data\" (PDF). European Commission. http:\/\/ec.europa.eu\/justice\/data-protection\/article-29\/documentation\/opinion-recommendation\/files\/2007\/wp136_en.pdf . Retrieved 04 February 2017 .   \n\n\u2191 Grubb, A. (2000). \"Breach of confidence: Anonymised information. R. v. Department of Health ex parte Source Informatics Ltd.\". Medical Law Review 8 (1): 115\u201320. PMID 11787501.   \n\n\u2191 House of Lords (09 July 2008). \"Judgments - Common Services Agency (Appellants) v Scottish Information Commissioner (Respondent) (Scotland)\". www.parliament.uk. https:\/\/www.publications.parliament.uk\/pa\/ld200708\/ldjudgmt\/jd080709\/comm-1.htm . Retrieved 04 February 2017 .   \n\n\u2191 Sweeney, L. (2000). \"Simple Demographics Often Identify People Uniquely\" (PDF). pp. 34. http:\/\/dataprivacylab.org\/projects\/identifiability\/paper1.pdf . Retrieved 04 February 2017 .   \n\n\u2191 22.0 22.1 22.2 Meek, T. (28 October 2015). \"Caldicott: care.data hangs on engagement\". Digital Health. Digital Health Intelligence Limited. http:\/\/www.digitalhealth.net\/2015\/10\/caldicott-care-data-hangs-on-engagement\/ . Retrieved 04 February 2017 .   \n\n\u2191 23.0 23.1 \"NHS England to close care.data programme following Caldicott Review\". Nathional Health Executive. Cognitive Publishing Ltd. 07 July 2016. http:\/\/www.nationalhealthexecutive.com\/Health-Care-News\/nhs-england-to-close-caredata-programme-following-caldicott-review . Retrieved 04 February 2017 .   \n\n\u2191 \"What is anonymisation?\". Guide to Data Protection. Information Commissioner\u2019s Office. https:\/\/ico.org.uk\/for-organisations\/guide-to-data-protection\/anonymisation\/ . Retrieved 05 February 2017 .   \n\n\u2191 Laurie, G.; Stevens, L.; Jones, K.H.; Dobbs, C. (30 June 2014). \"A Review of Evidence Relating to Harm Resulting from Uses of Health and Biomedical Data\" (PDF). Nuffield Council on Bioethics. http:\/\/nuffieldbioethics.org\/wp-content\/uploads\/A-Review-of-Evidence-Relating-to-Harms-Resulting-from-Uses-of-Health-and-Biomedical-Data-FINAL.pdf . Retrieved 05 February 2017 .   \n\n\u2191 \"Data protection and research in the European Union\" (PDF). European Forum for Good Clinical Practice. 06 October 2015. http:\/\/www.efgcp.eu\/downloads\/DP%20and%20Research%20in%20EU_HD_Final_06%2010%2015.pdf . Retrieved 05 February 2017 .   \n\n\u2191 Carter, P.; Laurie, G.T.; Dixon-Woods, M. (2015). \"The social licence for research: why care.data ran into trouble\". Journal of Medical Ethics 41 (5): 404-409. doi:10.1136\/medethics-2014-102374.   \n\n\u2191 Mason, R. (02 October 2016). \"Theresa May's 'great repeal bill': What's going to happen and when?\". The Guardian. Guardian News & Media Limited. https:\/\/www.theguardian.com\/politics\/2016\/oct\/02\/theresa-may-great-repeal-bill-eu-british-law . Retrieved 05 February 2017 .   \n\n\u2191 Wade, D.T. (2005). \"Ethics, audit, and research: All shades of grey\". BMJ 330 (7489): 468\u201371. doi:10.1136\/bmj.330.7489.468. PMC PMC549663. PMID 15731146. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC549663 .   \n\n\u2191 Hodson, H. (2016). \"Google knows your ills\". New Scientist 230 (3072): 22\u201323. doi:10.1016\/S0262-4079(16)30809-0.   \n\n\u2191 Shah, N.R.; Seger, A.C.; Seger, D.L. et al. (2006). \"Improving acceptance of computerized prescribing alerts in ambulatory care\". JAMIA 13 (1): 5\u201311. doi:10.1197\/jamia.M1868. PMC PMC1380196. PMID 16221941. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1380196 .   \n\n\u2191 Donnelly, C. (12 May 2016). \"ICO probes Google DeepMind patient data-sharing deal with NHS Hospital Trust\". Computer Weekly. TechTarget, Inc. http:\/\/www.computerweekly.com\/news\/450296175\/ICO-probes-Google-DeepMind-patient-data-sharing-deal-with-NHS-Hospital-Trust . Retrieved 05 February 2017 .   \n\n\u2191 Caldicott, F. (March 2013). \"Information to Share or Note to Share: The Information Governance Review\" (PDF). National Information Governance Board. https:\/\/www.gov.uk\/government\/uploads\/system\/uploads\/attachment_data\/file\/192572\/2900774_InfoGovernance_accv2.pdf . Retrieved 05 February 2017 .   \n\n\u2191 Kish, L.J.; Topol, E.J. (2015). \"Unpatients: Why patients should own their medical data\". Nature Biotechnology 33 (9): 921\u20134. doi:10.1038\/nbt.3340. PMID 26348958.   \n\n\u2191 \"Cross-border health project epSOS: What has it achieved?\". Digital Single Market. European Commission. 07 July 2014. https:\/\/ec.europa.eu\/digital-single-market\/en\/news\/cross-border-health-project-epsos-what-has-it-achieved . Retrieved 05 February 2017 .   \n\n\u2191 \"Act on secure digital communication and applications in the health care system (E-Health Act)\". Federal Ministry of Health. 29 September 2015. http:\/\/www.bundesgesundheitsministerium.de\/en\/health\/e-health-act.html . Retrieved 05 February 2017 .   \n\n\u2191 Dove, E.S.; Townend, D.; Meslin, E.M. et al. (2016). \"Research Ethics: Ethics review for international data-intensive research\". Science 351 (6280): 1399\u2013400. doi:10.1126\/science.aad5269. PMC PMC4838154. PMID 27013718. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4838154 .   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In several cases the PubMed ID was missing and was added to make the reference more useful. \nPer the distribution agreement, the following copyright information is also being added: \n\u00a9John Mark Michael Rumbold, Barbara Pierscionek. Originally published in the Journal of Medical Internet Research (http:\/\/www.jmir.org), 24.02.2017.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\">https:\/\/www.limswiki.org\/index.php\/Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2017)LIMSwiki journal articles (all)LIMSwiki journal articles on data regulationsLIMSwiki journal articles on health informatics\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 25 April 2017, at 15:04.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 1,145 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","35171859a8e80fe1a0d916059f4fdd3e_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_The_effect_of_the_General_Data_Protection_Regulation_on_medical_research skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:The effect of the General Data Protection Regulation on medical research<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p><b>Background<\/b>: The enactment of the General Data Protection Regulation (GDPR) will impact on European data science. Particular concerns relating to consent requirements that would severely restrict medical data research have been raised.\n<\/p><p><b>Objective<\/b>: Our objective is to explain the changes in data protection laws that apply to medical research and to discuss their potential impact.\n<\/p><p><b>Methods<\/b>: Analysis of ethicolegal requirements imposed by the GDPR\n<\/p><p><b>Results<\/b>: The GDPR makes the classification of pseudonymised data as personal data clearer, although it has not been entirely resolved. Biomedical research on personal data where consent has not been obtained must be of substantial public interest.\n<\/p><p><b>Conclusions<\/b>: The GDPR introduces protections for data subjects that aim for consistency across the E.U. The proposed changes will make little impact on biomedical data research.\n<\/p><p><b>Keywords<\/b>: pseudonymity, anonymity, untraceability, privacy-preserving protocols, informatics, data reporting, data protection, research ethics\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Overview\">Overview<\/span><\/h2>\n<p>There have been significant developments in European Union (E.U.) data protection law recently that will have an impact on health care professionals, particularly those engaged in research and audit. The General Data Protection Regulation (GDPR) has replaced the current legislation and comes into full effect in 2018.<sup id=\"rdp-ebb-cite_ref-EUR-LexGDPR_1-0\" class=\"reference\"><a href=\"#cite_note-EUR-LexGDPR-1\" rel=\"external_link\">[1]<\/a><\/sup> The implications for the handling of health care data of the GDPR will be discussed in this paper. Despite the recent referendum vote in the United Kingdom to leave the E.U., the GDPR will continue to be relevant to the United Kingdom, whether this is due to cooperation in European projects or because the United Kingdom continues to be a member of the European Economic Area (EEA).\n<\/p>\n<h2><span class=\"mw-headline\" id=\"The_Data_Protection_Directive\">The Data Protection Directive<\/span><\/h2>\n<p>Currently the relevant law in the United Kingdom is the Data Protection Act 1998, which is the United Kingdom\u2019s transposition of the Data Protection Directive (DPD). European directives are not directly enforceable, requiring member states to pass legislation to comply with their requirements. There are derogations (legal exemptions) for research, which in the case of the United Kingdom have been criticized for being too broad. The LRDP Kantor report for the European Commission criticizes the United Kingdom for disregard of the limitations, stating that the Data Protection Act blatantly violates the Directive by adding \"medical research\" to the list of medical purposes.<sup id=\"rdp-ebb-cite_ref-LRDPKantorComp10_2-0\" class=\"reference\"><a href=\"#cite_note-LRDPKantorComp10-2\" rel=\"external_link\">[2]<\/a><\/sup> The DPD requires a \"substantial public interest\" for member states to add to the derogations for processing of sensitive personal data (Article 8.4).\n<\/p><p>Differences between E.U. member states can result in research ethics committees in United Kingdom denying permission for National Health Service (NHS) data to be transferred to other E.U. countries (the opposite might also be the case in some circumstances).<sup id=\"rdp-ebb-cite_ref-VeerusLegi14_3-0\" class=\"reference\"><a href=\"#cite_note-VeerusLegi14-3\" rel=\"external_link\">[3]<\/a><\/sup> These differences have also contributed to the passage of the GDPR as part of the Digital Single Market strategy.<sup id=\"rdp-ebb-cite_ref-EUReform16_4-0\" class=\"reference\"><a href=\"#cite_note-EUReform16-4\" rel=\"external_link\">[4]<\/a><\/sup>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"The_law_as_it_will_be_from_2018:_The_General_Data_Protection_Regulation\">The law as it will be from 2018: The General Data Protection Regulation<\/span><\/h2>\n<p>The text of the GDPR has recently been agreed after a prolonged trilogue between the European Commission, Parliament, and the Council of Ministers.<sup id=\"rdp-ebb-cite_ref-AnsipState15_5-0\" class=\"reference\"><a href=\"#cite_note-AnsipState15-5\" rel=\"external_link\">[5]<\/a><\/sup> This legislation will replace the national transpositions of the DPD. Regulations are directly enforceable across the E.U. The GDPR comes into full effect on May 25, 2018, although member states are permitted minor differences in interpretation (the European Court of Justice is the ultimate arbiter). This legislation has the potential to affect projects using research data banks and Big Data.<sup id=\"rdp-ebb-cite_ref-MarrTheFive15_6-0\" class=\"reference\"><a href=\"#cite_note-MarrTheFive15-6\" rel=\"external_link\">[6]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-ThompsonAnal16_7-0\" class=\"reference\"><a href=\"#cite_note-ThompsonAnal16-7\" rel=\"external_link\">[7]<\/a><\/sup> There had been concerns that a clause inserted by the European Parliament requiring specific consent would prevent significant long-term epidemiological research taking place in the future<sup id=\"rdp-ebb-cite_ref-StevensTheProp15_8-0\" class=\"reference\"><a href=\"#cite_note-StevensTheProp15-8\" rel=\"external_link\">[8]<\/a><\/sup>, but this was rejected and the agreed text permits broad consent to \"certain areas of research when in keeping with recognized ethical standards\" (Recital 33).<sup id=\"rdp-ebb-cite_ref-SimonActive11_9-0\" class=\"reference\"><a href=\"#cite_note-SimonActive11-9\" rel=\"external_link\">[9]<\/a><\/sup> Broad consent is not blanket or open consent<sup id=\"rdp-ebb-cite_ref-HofmannBroad09_10-0\" class=\"reference\"><a href=\"#cite_note-HofmannBroad09-10\" rel=\"external_link\">[10]<\/a><\/sup> although some commentators argue that blanket or open consent is acceptable for biobank and databank research as the risks are minimal and do not vary for different projects.<sup id=\"rdp-ebb-cite_ref-SheehanCanBroad11_11-0\" class=\"reference\"><a href=\"#cite_note-SheehanCanBroad11-11\" rel=\"external_link\">[11]<\/a><\/sup> Another possibility is consent to a form of governance.<sup id=\"rdp-ebb-cite_ref-LaurieGovern13_12-0\" class=\"reference\"><a href=\"#cite_note-LaurieGovern13-12\" rel=\"external_link\">[12]<\/a><\/sup> Open consent without any ongoing regulation or communication about proposed projects would be potentially problematic. Dynamic consent offers advantages for an engaged community of participants but might not be considered beneficial by some individuals.<sup id=\"rdp-ebb-cite_ref-SteinsbekkBroad13_13-0\" class=\"reference\"><a href=\"#cite_note-SteinsbekkBroad13-13\" rel=\"external_link\">[13]<\/a><\/sup>\n<\/p><p>The derogations for research without consent have been expanded to specifically include medical research where \"in the public interest\" (Recital 51). How public interest will be defined has not been elaborated, but European jurisprudence demands member states satisfy a high threshold where human rights are involved (e.g., a \"pressing social need\"<sup id=\"rdp-ebb-cite_ref-ECHRCaseOf76_14-0\" class=\"reference\"><a href=\"#cite_note-ECHRCaseOf76-14\" rel=\"external_link\">[14]<\/a><\/sup>). This standard would not be required for the conduct of medical research using databanks, but it might exclude all commercial research for \"me too\" drug development (drugs that offer no advantages over drugs already on the market), arrangements that have no evidence of benefit sharing, or simply require that projects address issues of public importance regardless of the profits made.<sup id=\"rdp-ebb-cite_ref-HaddowTackling07_15-0\" class=\"reference\"><a href=\"#cite_note-HaddowTackling07-15\" rel=\"external_link\">[15]<\/a><\/sup> This requirement reflects public attitudes in the United Kingdom to the use of health care data, where there is resistance to use of public data for commercial ventures unless the research could not happen without commercial involvement.<sup id=\"rdp-ebb-cite_ref-AitkenSHIP11_16-0\" class=\"reference\"><a href=\"#cite_note-AitkenSHIP11-16\" rel=\"external_link\">[16]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-IpsosMORICommercial16_17-0\" class=\"reference\"><a href=\"#cite_note-IpsosMORICommercial16-17\" rel=\"external_link\">[17]<\/a><\/sup>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Anonymization\">Anonymization<\/span><\/h2>\n<p>Data protection law only applies to personal data \u2014 that is, data that does directly or can indirectly identify an individual.<sup id=\"rdp-ebb-cite_ref-DPWPOpinion07_18-0\" class=\"reference\"><a href=\"#cite_note-DPWPOpinion07-18\" rel=\"external_link\">[18]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-GrubbBreach00_19-0\" class=\"reference\"><a href=\"#cite_note-GrubbBreach00-19\" rel=\"external_link\">[19]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-HLJudgments08_20-0\" class=\"reference\"><a href=\"#cite_note-HLJudgments08-20\" rel=\"external_link\">[20]<\/a><\/sup> The simple deletion of name and address is usually insufficient to constitute anonymization (it has been demonstrated that the combination of three pieces of data could identify 87 percent of U.S. residents: 5-digit zip code, birth date, and sex).<sup id=\"rdp-ebb-cite_ref-SweeneySimple00_21-0\" class=\"reference\"><a href=\"#cite_note-SweeneySimple00-21\" rel=\"external_link\">[21]<\/a><\/sup> The United Kingdom Information Commissioner\u2019s Office currently treats pseudonymized data as anonymous where it is used by a third party who does not possess the requisite key code. Truly anonymized data cannot be linked back to an individual (which means that verification of data is not possible by any means). Pseudonymized data typically has identifiers removed and replaced with a unique key code (there is also two-way cryptography; one-way cryptography is considered anonymized). This key code can be used to trace the data back to an individual, enabling any safety concerns to be acted upon and for data to be verified. This is the approach that the United Kingdom's care.data project on the use of NHS electronic health records for data research has been taking.<sup id=\"rdp-ebb-cite_ref-MeekCaldicott15_22-0\" class=\"reference\"><a href=\"#cite_note-MeekCaldicott15-22\" rel=\"external_link\">[22]<\/a><\/sup> The GDPR will require changes in practice, as it confirms in Recital 26 that pseudonymized data must be treated as personal data (in line with the previous Article 29 Working Party opinion).<sup id=\"rdp-ebb-cite_ref-DPWPOpinion07_18-1\" class=\"reference\"><a href=\"#cite_note-DPWPOpinion07-18\" rel=\"external_link\">[18]<\/a><\/sup> That position results from the increased vulnerability of data subjects who could potentially be identified compared to the protection afforded them with true anonymisation \u2014 if the key code is hacked, then all the data can be linked to an individual once more.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Consent\">Consent<\/span><\/h2>\n<p>Consent presumed by failure to opt-out, or change preticked boxes, will no longer be permitted (unless covered by the derogations) \u2014 consent will need to be by a \"clear, affirmative action\" (Article 4.11). These changes would have arguably made the abandoned care.data project<sup id=\"rdp-ebb-cite_ref-NHENHS16_23-0\" class=\"reference\"><a href=\"#cite_note-NHENHS16-23\" rel=\"external_link\">[23]<\/a><\/sup> illegal, despite the passage of enabling legislation that exempted general practitioners from the common law duty of confidentiality when fulfilling their contractual duties to pass on health care data. The care.data program relied on an opt-out for legitimacy.<sup id=\"rdp-ebb-cite_ref-MeekCaldicott15_22-1\" class=\"reference\"><a href=\"#cite_note-MeekCaldicott15-22\" rel=\"external_link\">[22]<\/a><\/sup> The exercise of this opt-out was not straightforward. The numbers opting out far exceeded the estimates and the capacity for the Health and Social Care Information Centre (now NHS Digital) to process in a timely manner. The problems included omission of those who opted out from calls for NHS screening programs, even though this was not the intention of those exercising this right. NHS Digital currently relies on pseudonymization, which the GDPR states is categorized as a matter of law as personal data. It is not entirely clear whether or not third parties without access to the key code could treat pseudonymized data as anonymized (as is currently the case in the United Kingdom). Key codes are a potential vulnerability due to accidental or malicious disclosure, which is one of the justifications for pseudonymized data being classified as personal data. There are no clear indications that there are no future plans to use NHS patient data for research.\n<\/p><p>Dame Fiona Caldicott reviewed arrangements because of the widespread concerns related to consent<sup id=\"rdp-ebb-cite_ref-MeekCaldicott15_22-2\" class=\"reference\"><a href=\"#cite_note-MeekCaldicott15-22\" rel=\"external_link\">[22]<\/a><\/sup>, and her report led to the cancellation of the Care.data project.<sup id=\"rdp-ebb-cite_ref-NHENHS16_23-1\" class=\"reference\"><a href=\"#cite_note-NHENHS16-23\" rel=\"external_link\">[23]<\/a><\/sup> The particular issues that were identified include the lack of information about care.data that made exercising an opt-out an opaque process, the inadequate mechanisms for opting, and the failure of protection for rights and access to the NHS for those who opt out.\n<\/p><p>The risk of re-identification in the future is impossible to quantify precisely because it cannot be predicted what information will become public.<sup id=\"rdp-ebb-cite_ref-ICOAnony_24-0\" class=\"reference\"><a href=\"#cite_note-ICOAnony-24\" rel=\"external_link\">[24]<\/a><\/sup> However, as with biobanks, the risks to individuals are lesser compared with studies of medical interventions.<sup id=\"rdp-ebb-cite_ref-LaurieAReview14_25-0\" class=\"reference\"><a href=\"#cite_note-LaurieAReview14-25\" rel=\"external_link\">[25]<\/a><\/sup> Therefore authorization by research ethics committees is acceptable practice, with the requirement that opt-outs be respected unless there are exceptional circumstances.\n<\/p><p>Although the GDPR comes into force in mid-2018, researchers need to prepare now for the changes it will bring to long-term epidemiological studies. In particular, the categorization of pseudonymized data as personal will require action in some jurisdictions such as the United Kingdom and Greece.<sup id=\"rdp-ebb-cite_ref-EFGCPData15_26-0\" class=\"reference\"><a href=\"#cite_note-EFGCPData15-26\" rel=\"external_link\">[26]<\/a><\/sup> The necessary accommodations will require an investment of resources, but this will hopefully ensure that subjects continue to have trust in the integrity of their health care data and the medical research community.<sup id=\"rdp-ebb-cite_ref-CarterTheSocial15_27-0\" class=\"reference\"><a href=\"#cite_note-CarterTheSocial15-27\" rel=\"external_link\">[27]<\/a><\/sup> The GDPR may still apply should the United Kingdom cease to become a member state of the E.U. either because the United Kingdom is a member of the EEA or because the United Kingdom retains these instruments as law at least for the short term.<sup id=\"rdp-ebb-cite_ref-MasonTheresa16_28-0\" class=\"reference\"><a href=\"#cite_note-MasonTheresa16-28\" rel=\"external_link\">[28]<\/a><\/sup>\n<\/p><p>Although audit and research are treated differently in law, the boundaries between the two activities are blurred.<sup id=\"rdp-ebb-cite_ref-WadeEthics05_29-0\" class=\"reference\"><a href=\"#cite_note-WadeEthics05-29\" rel=\"external_link\">[29]<\/a><\/sup> Audit is directly relevant to the monitoring and improvement of quality of health care; therefore, it is included as a primary use of data\u2014Recitals 52-54 and Article 9.2 (h) and (i) of the GDPR make this clear. Audit and health care management are a primary use of health care data, and research is a secondary use \u2014 that is, it is a use different from the originally declared purpose (although it is designated a compatible purpose within the GDPR but only for nonsensitive data). If an audit compares health care systems to discover which is most effective, this can also be categorized as research as the practices are not compared to a gold standard, and there is a hypothesis being generated or even tested by finding associations. The recent furor surrounding the Royal Free Trust project in conjunction with Google DeepMind illustrates the debate over the distinction of audit from research.<sup id=\"rdp-ebb-cite_ref-HodsonGoogle16_30-0\" class=\"reference\"><a href=\"#cite_note-HodsonGoogle16-30\" rel=\"external_link\">[30]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-ShahImproving06_31-0\" class=\"reference\"><a href=\"#cite_note-ShahImproving06-31\" rel=\"external_link\">[31]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-DonnellyICO16_32-0\" class=\"reference\"><a href=\"#cite_note-DonnellyICO16-32\" rel=\"external_link\">[32]<\/a><\/sup>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Data_sharing\">Data sharing<\/span><\/h2>\n<p>Dame Fiona Caldicott affirmed in her 2013 report on information governance that \"The duty to share can be as important as the duty to protect patient confidentiality.\"<sup id=\"rdp-ebb-cite_ref-CaldicottInfo13_33-0\" class=\"reference\"><a href=\"#cite_note-CaldicottInfo13-33\" rel=\"external_link\">[33]<\/a><\/sup> Data sharing within the E.U. should not be obstructed because of differences in data protection law under the principles of the Digital Single Market and Article 1(2) of the Data Protection Directive. Data portability and data sharing is an issue with health care data<sup id=\"rdp-ebb-cite_ref-KishUnpatients15_34-0\" class=\"reference\"><a href=\"#cite_note-KishUnpatients15-34\" rel=\"external_link\">[34]<\/a><\/sup>, which the European Patients Smart Open Services (epSOS) project attempted to address.<sup id=\"rdp-ebb-cite_ref-ECCross14_35-0\" class=\"reference\"><a href=\"#cite_note-ECCross14-35\" rel=\"external_link\">[35]<\/a><\/sup> The GDPR addresses data portability under Article 20, stating that the data subject has the right to receive their data in an appropriate format without hindrance and for data to be transferred between data controllers where technically feasible. The Bundestag is currently considering an eHealth bill with the same aim of improving portability of data.<sup id=\"rdp-ebb-cite_ref-36\" class=\"reference\"><a href=\"#cite_note-36\" rel=\"external_link\">[36]<\/a><\/sup> This will facilitate the ability of patients to move between health care providers without unnecessary duplication of tests.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusions\">Conclusions<\/span><\/h2>\n<p>The Digital Single Market aims for improved data sharing across the E.U., which will facilitate cross-border health care and research. Harmonization will be improved under the GDPR with a concomitant raising of standards for some countries, although there is still room for national differences according to the reasonable expectations of different publics. This advance makes cross-border projects more easily ethically justifiable and more feasible.<sup id=\"rdp-ebb-cite_ref-DoveResearch16_37-0\" class=\"reference\"><a href=\"#cite_note-DoveResearch16-37\" rel=\"external_link\">[37]<\/a><\/sup> The requirements for anonymization have not been changed, except to clarify that pseudonymized data must still be considered as personal data. The GDPR will facilitate medical research, except where it is research not considered in the public interest. In that case, more demanding requirements for anonymization will entail either true anonymization or consent. It is likely there will be more projects that require either consent or authorization, since many projects currently use pseudonymization. There is still an unresolved issue over third parties with access to pseudonymized data.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h2>\n<p>This work has been funded by AEGLE project, Horizon 2020 ICT\/2014\/1 grant.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Authors.27_contributions\">Authors' contributions<\/span><\/h2>\n<p>Both authors contributed to the analysis of legal issues and the writing of the manuscript.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conflicts_of_interest\">Conflicts of interest<\/span><\/h2>\n<p>None declared.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Abbreviations\">Abbreviations<\/span><\/h2>\n<p><b>DPD<\/b>: Data Protection Directive\n<\/p><p><b>EEA<\/b>: European Economic Area\n<\/p><p><b>epSOS<\/b>: European Patients Smart Open Services\n<\/p><p><b>E.U.<\/b>: European Union\n<\/p><p><b>GDPR<\/b>: General Data Protection Regulation\n<\/p><p><b>NHS<\/b>: National Health Service\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-EUR-LexGDPR-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-EUR-LexGDPR_1-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/eur-lex.europa.eu\/eli\/reg\/2016\/679\/oj\" target=\"_blank\">\"EUR-Lex - 32016R0679 - EN\"<\/a>. <i>EUR-Lex<\/i>. European Union. 27 April 2016<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/eur-lex.europa.eu\/eli\/reg\/2016\/679\/oj\" target=\"_blank\">http:\/\/eur-lex.europa.eu\/eli\/reg\/2016\/679\/oj<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=EUR-Lex+-+32016R0679+-+EN&rft.atitle=EUR-Lex&rft.date=27+April+2016&rft.pub=European+Union&rft_id=http%3A%2F%2Feur-lex.europa.eu%2Feli%2Freg%2F2016%2F679%2Foj&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LRDPKantorComp10-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LRDPKantorComp10_2-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">LRDP Kantor (20 January 2010). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/ec.europa.eu\/justice\/policies\/privacy\/docs\/studies\/new_privacy_challenges\/final_report_en.pdf\" target=\"_blank\">\"Comparative study on different approaches to new privacy challenges in particular in the light of technology developments\"<\/a> (PDF). European Commission<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/ec.europa.eu\/justice\/policies\/privacy\/docs\/studies\/new_privacy_challenges\/final_report_en.pdf\" target=\"_blank\">http:\/\/ec.europa.eu\/justice\/policies\/privacy\/docs\/studies\/new_privacy_challenges\/final_report_en.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Comparative+study+on+different+approaches+to+new+privacy+challenges+in+particular+in+the+light+of+technology+developments&rft.atitle=&rft.aulast=LRDP+Kantor&rft.au=LRDP+Kantor&rft.date=20+January+2010&rft.pub=European+Commission&rft_id=http%3A%2F%2Fec.europa.eu%2Fjustice%2Fpolicies%2Fprivacy%2Fdocs%2Fstudies%2Fnew_privacy_challenges%2Ffinal_report_en.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-VeerusLegi14-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-VeerusLegi14_3-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Veerus, P.; Lexchin, J.; Hemminki, E. (2014). \"Legislative regulation and ethical governance of medical research in different European Union countries\". <i>Journal of Medical Ethics<\/i> <b>40<\/b> (6): 409-413. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1136%2Fmedethics-2012-101282\" target=\"_blank\">10.1136\/medethics-2012-101282<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Legislative+regulation+and+ethical+governance+of+medical+research+in+different+European+Union+countries&rft.jtitle=Journal+of+Medical+Ethics&rft.aulast=Veerus%2C+P.%3B+Lexchin%2C+J.%3B+Hemminki%2C+E.&rft.au=Veerus%2C+P.%3B+Lexchin%2C+J.%3B+Hemminki%2C+E.&rft.date=2014&rft.volume=40&rft.issue=6&rft.pages=409-413&rft_id=info:doi\/10.1136%2Fmedethics-2012-101282&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-EUReform16-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-EUReform16_4-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">DG Justice (18 January 2016). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/ec.europa.eu\/justice\/data-protection\/reform\/index_en.htm\" target=\"_blank\">\"Reform of EU data protection rules\"<\/a>. European Commission<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/ec.europa.eu\/justice\/data-protection\/reform\/index_en.htm\" target=\"_blank\">http:\/\/ec.europa.eu\/justice\/data-protection\/reform\/index_en.htm<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Reform+of+EU+data+protection+rules&rft.atitle=&rft.aulast=DG+Justice&rft.au=DG+Justice&rft.date=18+January+2016&rft.pub=European+Commission&rft_id=http%3A%2F%2Fec.europa.eu%2Fjustice%2Fdata-protection%2Freform%2Findex_en.htm&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AnsipState15-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AnsipState15_5-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Ansip, A. (06 May 2015). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/europa.eu\/rapid\/press-release_SPEECH-15-4926_en.htm\" target=\"_blank\">\"Statement by Vice-President Andrus Ansip at the press conference on the adoption of the Digital Single Market Strategy\"<\/a>. European Commission<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/europa.eu\/rapid\/press-release_SPEECH-15-4926_en.htm\" target=\"_blank\">http:\/\/europa.eu\/rapid\/press-release_SPEECH-15-4926_en.htm<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Statement+by+Vice-President+Andrus+Ansip+at+the+press+conference+on+the+adoption+of+the+Digital+Single+Market+Strategy&rft.atitle=&rft.aulast=Ansip%2C+A.&rft.au=Ansip%2C+A.&rft.date=06+May+2015&rft.pub=European+Commission&rft_id=http%3A%2F%2Feuropa.eu%2Frapid%2Fpress-release_SPEECH-15-4926_en.htm&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MarrTheFive15-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MarrTheFive15_6-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Marr, B. (09 April 2015). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.datasciencecentral.com\/profiles\/blogs\/the-5-v-s-of-big-data-by-bernard-marr\" target=\"_blank\">\"The 5 V's of Big Data by Bernard Marr\"<\/a>. <i>Data Science Central<\/i><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.datasciencecentral.com\/profiles\/blogs\/the-5-v-s-of-big-data-by-bernard-marr\" target=\"_blank\">http:\/\/www.datasciencecentral.com\/profiles\/blogs\/the-5-v-s-of-big-data-by-bernard-marr<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=The+5+V%27s+of+Big+Data+by+Bernard+Marr&rft.atitle=Data+Science+Central&rft.aulast=Marr%2C+B.&rft.au=Marr%2C+B.&rft.date=09+April+2015&rft_id=http%3A%2F%2Fwww.datasciencecentral.com%2Fprofiles%2Fblogs%2Fthe-5-v-s-of-big-data-by-bernard-marr&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ThompsonAnal16-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ThompsonAnal16_7-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Thompson, B. (July 2016). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/wellcome.ac.uk\/sites\/default\/files\/new-data-protection-regulation-key-clauses-wellcome-jul16.pdf\" target=\"_blank\">\"Analysis: Research and the General Data Protection Regulation - 2012\/0011(COD)\"<\/a> (PDF). Wellcome Trust<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/wellcome.ac.uk\/sites\/default\/files\/new-data-protection-regulation-key-clauses-wellcome-jul16.pdf\" target=\"_blank\">https:\/\/wellcome.ac.uk\/sites\/default\/files\/new-data-protection-regulation-key-clauses-wellcome-jul16.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Analysis%3A+Research+and+the+General+Data+Protection+Regulation+-+2012%2F0011%28COD%29&rft.atitle=&rft.aulast=Thompson%2C+B.&rft.au=Thompson%2C+B.&rft.date=July+2016&rft.pub=Wellcome+Trust&rft_id=https%3A%2F%2Fwellcome.ac.uk%2Fsites%2Fdefault%2Ffiles%2Fnew-data-protection-regulation-key-clauses-wellcome-jul16.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-StevensTheProp15-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-StevensTheProp15_8-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Stevens, L. (2015). \"The Proposed Data Protection Regulation and Its Potential Impact on Social Sciences Research in the UK\". <i>European Data Protection Law Review<\/i> <b>1<\/b> (2): 97\u2013112. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.21552%2FEDPL%2F2015%2F2%2F4\" target=\"_blank\">10.21552\/EDPL\/2015\/2\/4<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Proposed+Data+Protection+Regulation+and+Its+Potential+Impact+on+Social+Sciences+Research+in+the+UK&rft.jtitle=European+Data+Protection+Law+Review&rft.aulast=Stevens%2C+L.&rft.au=Stevens%2C+L.&rft.date=2015&rft.volume=1&rft.issue=2&rft.pages=97%E2%80%93112&rft_id=info:doi\/10.21552%2FEDPL%2F2015%2F2%2F4&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SimonActive11-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SimonActive11_9-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Simon, C.M.; L'heureux, J.; Murray, J.C. (2011). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3658114\" target=\"_blank\">\"Active choice but not too active: Public perspectives on biobank consent models\"<\/a>. <i>Genetics in Medicine<\/i> <b>13<\/b> (9): 821\u201331. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1097%2FGIM.0b013e31821d2f88\" target=\"_blank\">10.1097\/GIM.0b013e31821d2f88<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3658114\/\" target=\"_blank\">PMC3658114<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21555942\" target=\"_blank\">21555942<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3658114\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3658114<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Active+choice+but+not+too+active%3A+Public+perspectives+on+biobank+consent+models&rft.jtitle=Genetics+in+Medicine&rft.aulast=Simon%2C+C.M.%3B+L%27heureux%2C+J.%3B+Murray%2C+J.C.&rft.au=Simon%2C+C.M.%3B+L%27heureux%2C+J.%3B+Murray%2C+J.C.&rft.date=2011&rft.volume=13&rft.issue=9&rft.pages=821%E2%80%9331&rft_id=info:doi\/10.1097%2FGIM.0b013e31821d2f88&rft_id=info:pmc\/PMC3658114&rft_id=info:pmid\/21555942&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3658114&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HofmannBroad09-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HofmannBroad09_10-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Hofmann, B. (2009). \"Broadening consent\u2014and diluting ethics?\". <i>Journal of Medical Ethics<\/i> <b>35<\/b> (2): 125\u2013129. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1136%2Fjme.2008.024851\" target=\"_blank\">10.1136\/jme.2008.024851<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Broadening+consent%E2%80%94and+diluting+ethics%3F&rft.jtitle=Journal+of+Medical+Ethics&rft.aulast=Hofmann%2C+B.&rft.au=Hofmann%2C+B.&rft.date=2009&rft.volume=35&rft.issue=2&rft.pages=125%E2%80%93129&rft_id=info:doi\/10.1136%2Fjme.2008.024851&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SheehanCanBroad11-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SheehanCanBroad11_11-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sheehan, M. (2011). \"Can broad consent be informed consent?\". <i>Public Health Ethics<\/i> <b>4<\/b> (3): 226\u2013235. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1093%2Fphe%2Fphr020\" target=\"_blank\">10.1093\/phe\/phr020<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Can+broad+consent+be+informed+consent%3F&rft.jtitle=Public+Health+Ethics&rft.aulast=Sheehan%2C+M.&rft.au=Sheehan%2C+M.&rft.date=2011&rft.volume=4&rft.issue=3&rft.pages=226%E2%80%93235&rft_id=info:doi\/10.1093%2Fphe%2Fphr020&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LaurieGovern13-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LaurieGovern13_12-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Laurie, G. (2013). \"Governing the spaces in-between: Law and legitimacy in new health technologies\". In Flear, M.L.; Farrell, A.; Hervey, T.K.; Murphy, T.. <i>European Law and New Health Technologies<\/i>. Oxford University Press. pp. 193. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780199659210.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Governing+the+spaces+in-between%3A+Law+and+legitimacy+in+new+health+technologies&rft.atitle=European+Law+and+New+Health+Technologies&rft.aulast=Laurie%2C+G.&rft.au=Laurie%2C+G.&rft.date=2013&rft.pages=pp.%26nbsp%3B193&rft.pub=Oxford+University+Press&rft.isbn=9780199659210&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SteinsbekkBroad13-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SteinsbekkBroad13_13-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Steinsbekk, K.S.; K\u00e5re Myskja, B.; Solberg, B. (2013). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3746258\" target=\"_blank\">\"Broad consent versus dynamic consent in biobank research: is passive participation an ethical problem?\"<\/a>. <i>European Journal of Human Genetics<\/i> <b>21<\/b> (9): 897-902. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fejhg.2012.282\" target=\"_blank\">10.1038\/ejhg.2012.282<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3746258\/\" target=\"_blank\">PMC3746258<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23299918\" target=\"_blank\">23299918<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3746258\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3746258<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Broad+consent+versus+dynamic+consent+in+biobank+research%3A+is+passive+participation+an+ethical+problem%3F&rft.jtitle=European+Journal+of+Human+Genetics&rft.aulast=Steinsbekk%2C+K.S.%3B+K%C3%A5re+Myskja%2C+B.%3B+Solberg%2C+B.&rft.au=Steinsbekk%2C+K.S.%3B+K%C3%A5re+Myskja%2C+B.%3B+Solberg%2C+B.&rft.date=2013&rft.volume=21&rft.issue=9&rft.pages=897-902&rft_id=info:doi\/10.1038%2Fejhg.2012.282&rft_id=info:pmc\/PMC3746258&rft_id=info:pmid\/23299918&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3746258&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ECHRCaseOf76-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ECHRCaseOf76_14-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/hudoc.echr.coe.int\/eng?i=001-57499\" target=\"_blank\">\"Case of Handyside v. The United Kingdom\"<\/a>. European Court of Human Rights. 07 December 1976<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/hudoc.echr.coe.int\/eng?i=001-57499\" target=\"_blank\">http:\/\/hudoc.echr.coe.int\/eng?i=001-57499<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 15 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Case+of+Handyside+v.+The+United+Kingdom&rft.atitle=&rft.date=07+December+1976&rft.pub=European+Court+of+Human+Rights&rft_id=http%3A%2F%2Fhudoc.echr.coe.int%2Feng%3Fi%3D001-57499&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HaddowTackling07-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HaddowTackling07_15-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Haddow, G.; Laurie, G.; Cunningham-Burley, S.; Hunter, K.G. (2007). \"Tackling community concerns about commercialisation and genetic research: A modest interdisciplinary proposal\". <i>Social Science & Medicine<\/i> <b>64<\/b> (2): 272\u201382. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.socscimed.2006.08.028\" target=\"_blank\">10.1016\/j.socscimed.2006.08.028<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/17050056\" target=\"_blank\">17050056<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Tackling+community+concerns+about+commercialisation+and+genetic+research%3A+A+modest+interdisciplinary+proposal&rft.jtitle=Social+Science+%26+Medicine&rft.aulast=Haddow%2C+G.%3B+Laurie%2C+G.%3B+Cunningham-Burley%2C+S.%3B+Hunter%2C+K.G.&rft.au=Haddow%2C+G.%3B+Laurie%2C+G.%3B+Cunningham-Burley%2C+S.%3B+Hunter%2C+K.G.&rft.date=2007&rft.volume=64&rft.issue=2&rft.pages=272%E2%80%9382&rft_id=info:doi\/10.1016%2Fj.socscimed.2006.08.028&rft_id=info:pmid\/17050056&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AitkenSHIP11-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AitkenSHIP11_16-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Aiken, M. (03 August 2011). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.academia.edu\/2702142\/SHIP_Public_Engagement_Summary_of_Focus_Groups\" target=\"_blank\">\"SHIP Public Engagement: Summary of Focus Group Findings\"<\/a>. Wellcome Trust<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.academia.edu\/2702142\/SHIP_Public_Engagement_Summary_of_Focus_Groups\" target=\"_blank\">http:\/\/www.academia.edu\/2702142\/SHIP_Public_Engagement_Summary_of_Focus_Groups<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=SHIP+Public+Engagement%3A+Summary+of+Focus+Group+Findings&rft.atitle=&rft.aulast=Aiken%2C+M.&rft.au=Aiken%2C+M.&rft.date=03+August+2011&rft.pub=Wellcome+Trust&rft_id=http%3A%2F%2Fwww.academia.edu%2F2702142%2FSHIP_Public_Engagement_Summary_of_Focus_Groups&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-IpsosMORICommercial16-17\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-IpsosMORICommercial16_17-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Ipsos MORI (09 March 2016). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.ipsos-mori.com\/researchpublications\/publications\/1803\/Commercial-access-to-health-data.aspx\" target=\"_blank\">\"The One-Way Mirror: Public attitudes to commercial access to health data\"<\/a>. pp. 154<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.ipsos-mori.com\/researchpublications\/publications\/1803\/Commercial-access-to-health-data.aspx\" target=\"_blank\">https:\/\/www.ipsos-mori.com\/researchpublications\/publications\/1803\/Commercial-access-to-health-data.aspx<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=The+One-Way+Mirror%3A+Public+attitudes+to+commercial+access+to+health+data&rft.atitle=&rft.aulast=Ipsos+MORI&rft.au=Ipsos+MORI&rft.date=09+March+2016&rft.pages=pp.+154&rft_id=https%3A%2F%2Fwww.ipsos-mori.com%2Fresearchpublications%2Fpublications%2F1803%2FCommercial-access-to-health-data.aspx&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DPWPOpinion07-18\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-DPWPOpinion07_18-0\" rel=\"external_link\">18.0<\/a><\/sup> <sup><a href=\"#cite_ref-DPWPOpinion07_18-1\" rel=\"external_link\">18.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Data Protection Working Party (April 2007). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/ec.europa.eu\/justice\/data-protection\/article-29\/documentation\/opinion-recommendation\/files\/2007\/wp136_en.pdf\" target=\"_blank\">\"Opinion 4\/2007 on the concept of personal data\"<\/a> (PDF). European Commission<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/ec.europa.eu\/justice\/data-protection\/article-29\/documentation\/opinion-recommendation\/files\/2007\/wp136_en.pdf\" target=\"_blank\">http:\/\/ec.europa.eu\/justice\/data-protection\/article-29\/documentation\/opinion-recommendation\/files\/2007\/wp136_en.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Opinion+4%2F2007+on+the+concept+of+personal+data&rft.atitle=&rft.aulast=Data+Protection+Working+Party&rft.au=Data+Protection+Working+Party&rft.date=April+2007&rft.pub=European+Commission&rft_id=http%3A%2F%2Fec.europa.eu%2Fjustice%2Fdata-protection%2Farticle-29%2Fdocumentation%2Fopinion-recommendation%2Ffiles%2F2007%2Fwp136_en.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GrubbBreach00-19\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GrubbBreach00_19-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Grubb, A. (2000). \"Breach of confidence: Anonymised information. R. v. Department of Health ex parte Source Informatics Ltd.\". <i>Medical Law Review<\/i> <b>8<\/b> (1): 115\u201320. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/11787501\" target=\"_blank\">11787501<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Breach+of+confidence%3A+Anonymised+information.+R.+v.+Department+of+Health+ex+parte+Source+Informatics+Ltd.&rft.jtitle=Medical+Law+Review&rft.aulast=Grubb%2C+A.&rft.au=Grubb%2C+A.&rft.date=2000&rft.volume=8&rft.issue=1&rft.pages=115%E2%80%9320&rft_id=info:pmid\/11787501&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HLJudgments08-20\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HLJudgments08_20-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">House of Lords (09 July 2008). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.publications.parliament.uk\/pa\/ld200708\/ldjudgmt\/jd080709\/comm-1.htm\" target=\"_blank\">\"Judgments - Common Services Agency (Appellants) v Scottish Information Commissioner (Respondent) (Scotland)\"<\/a>. <i>www.parliament.uk<\/i><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.publications.parliament.uk\/pa\/ld200708\/ldjudgmt\/jd080709\/comm-1.htm\" target=\"_blank\">https:\/\/www.publications.parliament.uk\/pa\/ld200708\/ldjudgmt\/jd080709\/comm-1.htm<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Judgments+-+Common+Services+Agency+%28Appellants%29+v+Scottish+Information+Commissioner+%28Respondent%29+%28Scotland%29&rft.atitle=www.parliament.uk&rft.aulast=House+of+Lords&rft.au=House+of+Lords&rft.date=09+July+2008&rft_id=https%3A%2F%2Fwww.publications.parliament.uk%2Fpa%2Fld200708%2Fldjudgmt%2Fjd080709%2Fcomm-1.htm&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SweeneySimple00-21\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SweeneySimple00_21-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Sweeney, L. (2000). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dataprivacylab.org\/projects\/identifiability\/paper1.pdf\" target=\"_blank\">\"Simple Demographics Often Identify People Uniquely\"<\/a> (PDF). pp. 34<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/dataprivacylab.org\/projects\/identifiability\/paper1.pdf\" target=\"_blank\">http:\/\/dataprivacylab.org\/projects\/identifiability\/paper1.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Simple+Demographics+Often+Identify+People+Uniquely&rft.atitle=&rft.aulast=Sweeney%2C+L.&rft.au=Sweeney%2C+L.&rft.date=2000&rft.pages=pp.+34&rft_id=http%3A%2F%2Fdataprivacylab.org%2Fprojects%2Fidentifiability%2Fpaper1.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MeekCaldicott15-22\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MeekCaldicott15_22-0\" rel=\"external_link\">22.0<\/a><\/sup> <sup><a href=\"#cite_ref-MeekCaldicott15_22-1\" rel=\"external_link\">22.1<\/a><\/sup> <sup><a href=\"#cite_ref-MeekCaldicott15_22-2\" rel=\"external_link\">22.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Meek, T. (28 October 2015). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.digitalhealth.net\/2015\/10\/caldicott-care-data-hangs-on-engagement\/\" target=\"_blank\">\"Caldicott: care.data hangs on engagement\"<\/a>. <i>Digital Health<\/i>. Digital Health Intelligence Limited<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.digitalhealth.net\/2015\/10\/caldicott-care-data-hangs-on-engagement\/\" target=\"_blank\">http:\/\/www.digitalhealth.net\/2015\/10\/caldicott-care-data-hangs-on-engagement\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Caldicott%3A+care.data+hangs+on+engagement&rft.atitle=Digital+Health&rft.aulast=Meek%2C+T.&rft.au=Meek%2C+T.&rft.date=28+October+2015&rft.pub=Digital+Health+Intelligence+Limited&rft_id=http%3A%2F%2Fwww.digitalhealth.net%2F2015%2F10%2Fcaldicott-care-data-hangs-on-engagement%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NHENHS16-23\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-NHENHS16_23-0\" rel=\"external_link\">23.0<\/a><\/sup> <sup><a href=\"#cite_ref-NHENHS16_23-1\" rel=\"external_link\">23.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.nationalhealthexecutive.com\/Health-Care-News\/nhs-england-to-close-caredata-programme-following-caldicott-review\" target=\"_blank\">\"NHS England to close care.data programme following Caldicott Review\"<\/a>. <i>Nathional Health Executive<\/i>. Cognitive Publishing Ltd. 07 July 2016<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.nationalhealthexecutive.com\/Health-Care-News\/nhs-england-to-close-caredata-programme-following-caldicott-review\" target=\"_blank\">http:\/\/www.nationalhealthexecutive.com\/Health-Care-News\/nhs-england-to-close-caredata-programme-following-caldicott-review<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 04 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=NHS+England+to+close+care.data+programme+following+Caldicott+Review&rft.atitle=Nathional+Health+Executive&rft.date=07+July+2016&rft.pub=Cognitive+Publishing+Ltd&rft_id=http%3A%2F%2Fwww.nationalhealthexecutive.com%2FHealth-Care-News%2Fnhs-england-to-close-caredata-programme-following-caldicott-review&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ICOAnony-24\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ICOAnony_24-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/ico.org.uk\/for-organisations\/guide-to-data-protection\/anonymisation\/\" target=\"_blank\">\"What is anonymisation?\"<\/a>. <i>Guide to Data Protection<\/i>. Information Commissioner\u2019s Office<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/ico.org.uk\/for-organisations\/guide-to-data-protection\/anonymisation\/\" target=\"_blank\">https:\/\/ico.org.uk\/for-organisations\/guide-to-data-protection\/anonymisation\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 05 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=What+is+anonymisation%3F&rft.atitle=Guide+to+Data+Protection&rft.pub=Information+Commissioner%E2%80%99s+Office&rft_id=https%3A%2F%2Fico.org.uk%2Ffor-organisations%2Fguide-to-data-protection%2Fanonymisation%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LaurieAReview14-25\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LaurieAReview14_25-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Laurie, G.; Stevens, L.; Jones, K.H.; Dobbs, C. (30 June 2014). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/nuffieldbioethics.org\/wp-content\/uploads\/A-Review-of-Evidence-Relating-to-Harms-Resulting-from-Uses-of-Health-and-Biomedical-Data-FINAL.pdf\" target=\"_blank\">\"A Review of Evidence Relating to Harm Resulting from Uses of Health and Biomedical Data\"<\/a> (PDF). Nuffield Council on Bioethics<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/nuffieldbioethics.org\/wp-content\/uploads\/A-Review-of-Evidence-Relating-to-Harms-Resulting-from-Uses-of-Health-and-Biomedical-Data-FINAL.pdf\" target=\"_blank\">http:\/\/nuffieldbioethics.org\/wp-content\/uploads\/A-Review-of-Evidence-Relating-to-Harms-Resulting-from-Uses-of-Health-and-Biomedical-Data-FINAL.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 05 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=A+Review+of+Evidence+Relating+to+Harm+Resulting+from+Uses+of+Health+and+Biomedical+Data&rft.atitle=&rft.aulast=Laurie%2C+G.%3B+Stevens%2C+L.%3B+Jones%2C+K.H.%3B+Dobbs%2C+C.&rft.au=Laurie%2C+G.%3B+Stevens%2C+L.%3B+Jones%2C+K.H.%3B+Dobbs%2C+C.&rft.date=30+June+2014&rft.pub=Nuffield+Council+on+Bioethics&rft_id=http%3A%2F%2Fnuffieldbioethics.org%2Fwp-content%2Fuploads%2FA-Review-of-Evidence-Relating-to-Harms-Resulting-from-Uses-of-Health-and-Biomedical-Data-FINAL.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-EFGCPData15-26\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-EFGCPData15_26-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.efgcp.eu\/downloads\/DP%20and%20Research%20in%20EU_HD_Final_06%2010%2015.pdf\" target=\"_blank\">\"Data protection and research in the European Union\"<\/a> (PDF). European Forum for Good Clinical Practice. 06 October 2015<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.efgcp.eu\/downloads\/DP%20and%20Research%20in%20EU_HD_Final_06%2010%2015.pdf\" target=\"_blank\">http:\/\/www.efgcp.eu\/downloads\/DP%20and%20Research%20in%20EU_HD_Final_06%2010%2015.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 05 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Data+protection+and+research+in+the+European+Union&rft.atitle=&rft.date=06+October+2015&rft.pub=European+Forum+for+Good+Clinical+Practice&rft_id=http%3A%2F%2Fwww.efgcp.eu%2Fdownloads%2FDP%2520and%2520Research%2520in%2520EU_HD_Final_06%252010%252015.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CarterTheSocial15-27\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CarterTheSocial15_27-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Carter, P.; Laurie, G.T.; Dixon-Woods, M. (2015). \"The social licence for research: why care.data ran into trouble\". <i>Journal of Medical Ethics<\/i> <b>41<\/b> (5): 404-409. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1136%2Fmedethics-2014-102374\" target=\"_blank\">10.1136\/medethics-2014-102374<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+social+licence+for+research%3A+why+care.data+ran+into+trouble&rft.jtitle=Journal+of+Medical+Ethics&rft.aulast=Carter%2C+P.%3B+Laurie%2C+G.T.%3B+Dixon-Woods%2C+M.&rft.au=Carter%2C+P.%3B+Laurie%2C+G.T.%3B+Dixon-Woods%2C+M.&rft.date=2015&rft.volume=41&rft.issue=5&rft.pages=404-409&rft_id=info:doi\/10.1136%2Fmedethics-2014-102374&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MasonTheresa16-28\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MasonTheresa16_28-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Mason, R. (02 October 2016). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.theguardian.com\/politics\/2016\/oct\/02\/theresa-may-great-repeal-bill-eu-british-law\" target=\"_blank\">\"Theresa May's 'great repeal bill': What's going to happen and when?\"<\/a>. <i>The Guardian<\/i>. Guardian News & Media Limited<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.theguardian.com\/politics\/2016\/oct\/02\/theresa-may-great-repeal-bill-eu-british-law\" target=\"_blank\">https:\/\/www.theguardian.com\/politics\/2016\/oct\/02\/theresa-may-great-repeal-bill-eu-british-law<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 05 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Theresa+May%27s+%27great+repeal+bill%27%3A+What%27s+going+to+happen+and+when%3F&rft.atitle=The+Guardian&rft.aulast=Mason%2C+R.&rft.au=Mason%2C+R.&rft.date=02+October+2016&rft.pub=Guardian+News+%26+Media+Limited&rft_id=https%3A%2F%2Fwww.theguardian.com%2Fpolitics%2F2016%2Foct%2F02%2Ftheresa-may-great-repeal-bill-eu-british-law&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WadeEthics05-29\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WadeEthics05_29-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Wade, D.T. (2005). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC549663\" target=\"_blank\">\"Ethics, audit, and research: All shades of grey\"<\/a>. <i>BMJ<\/i> <b>330<\/b> (7489): 468\u201371. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1136%2Fbmj.330.7489.468\" target=\"_blank\">10.1136\/bmj.330.7489.468<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC549663\/\" target=\"_blank\">PMC549663<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/15731146\" target=\"_blank\">15731146<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC549663\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC549663<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ethics%2C+audit%2C+and+research%3A+All+shades+of+grey&rft.jtitle=BMJ&rft.aulast=Wade%2C+D.T.&rft.au=Wade%2C+D.T.&rft.date=2005&rft.volume=330&rft.issue=7489&rft.pages=468%E2%80%9371&rft_id=info:doi\/10.1136%2Fbmj.330.7489.468&rft_id=info:pmc\/PMC549663&rft_id=info:pmid\/15731146&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC549663&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HodsonGoogle16-30\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HodsonGoogle16_30-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Hodson, H. (2016). \"Google knows your ills\". <i>New Scientist<\/i> <b>230<\/b> (3072): 22\u201323. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1016%2FS0262-4079%2816%2930809-0\" target=\"_blank\">10.1016\/S0262-4079(16)30809-0<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Google+knows+your+ills&rft.jtitle=New+Scientist&rft.aulast=Hodson%2C+H.&rft.au=Hodson%2C+H.&rft.date=2016&rft.volume=230&rft.issue=3072&rft.pages=22%E2%80%9323&rft_id=info:doi\/10.1016%2FS0262-4079%2816%2930809-0&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ShahImproving06-31\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ShahImproving06_31-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Shah, N.R.; Seger, A.C.; Seger, D.L. et al. (2006). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1380196\" target=\"_blank\">\"Improving acceptance of computerized prescribing alerts in ambulatory care\"<\/a>. <i>JAMIA<\/i> <b>13<\/b> (1): 5\u201311. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1197%2Fjamia.M1868\" target=\"_blank\">10.1197\/jamia.M1868<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC1380196\/\" target=\"_blank\">PMC1380196<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/16221941\" target=\"_blank\">16221941<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1380196\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1380196<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+acceptance+of+computerized+prescribing+alerts+in+ambulatory+care&rft.jtitle=JAMIA&rft.aulast=Shah%2C+N.R.%3B+Seger%2C+A.C.%3B+Seger%2C+D.L.+et+al.&rft.au=Shah%2C+N.R.%3B+Seger%2C+A.C.%3B+Seger%2C+D.L.+et+al.&rft.date=2006&rft.volume=13&rft.issue=1&rft.pages=5%E2%80%9311&rft_id=info:doi\/10.1197%2Fjamia.M1868&rft_id=info:pmc\/PMC1380196&rft_id=info:pmid\/16221941&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC1380196&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DonnellyICO16-32\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DonnellyICO16_32-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Donnelly, C. (12 May 2016). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.computerweekly.com\/news\/450296175\/ICO-probes-Google-DeepMind-patient-data-sharing-deal-with-NHS-Hospital-Trust\" target=\"_blank\">\"ICO probes Google DeepMind patient data-sharing deal with NHS Hospital Trust\"<\/a>. <i>Computer Weekly<\/i>. TechTarget, Inc<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.computerweekly.com\/news\/450296175\/ICO-probes-Google-DeepMind-patient-data-sharing-deal-with-NHS-Hospital-Trust\" target=\"_blank\">http:\/\/www.computerweekly.com\/news\/450296175\/ICO-probes-Google-DeepMind-patient-data-sharing-deal-with-NHS-Hospital-Trust<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 05 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=ICO+probes+Google+DeepMind+patient+data-sharing+deal+with+NHS+Hospital+Trust&rft.atitle=Computer+Weekly&rft.aulast=Donnelly%2C+C.&rft.au=Donnelly%2C+C.&rft.date=12+May+2016&rft.pub=TechTarget%2C+Inc&rft_id=http%3A%2F%2Fwww.computerweekly.com%2Fnews%2F450296175%2FICO-probes-Google-DeepMind-patient-data-sharing-deal-with-NHS-Hospital-Trust&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CaldicottInfo13-33\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CaldicottInfo13_33-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Caldicott, F. (March 2013). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.gov.uk\/government\/uploads\/system\/uploads\/attachment_data\/file\/192572\/2900774_InfoGovernance_accv2.pdf\" target=\"_blank\">\"Information to Share or Note to Share: The Information Governance Review\"<\/a> (PDF). National Information Governance Board<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.gov.uk\/government\/uploads\/system\/uploads\/attachment_data\/file\/192572\/2900774_InfoGovernance_accv2.pdf\" target=\"_blank\">https:\/\/www.gov.uk\/government\/uploads\/system\/uploads\/attachment_data\/file\/192572\/2900774_InfoGovernance_accv2.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 05 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Information+to+Share+or+Note+to+Share%3A+The+Information+Governance+Review&rft.atitle=&rft.aulast=Caldicott%2C+F.&rft.au=Caldicott%2C+F.&rft.date=March+2013&rft.pub=National+Information+Governance+Board&rft_id=https%3A%2F%2Fwww.gov.uk%2Fgovernment%2Fuploads%2Fsystem%2Fuploads%2Fattachment_data%2Ffile%2F192572%2F2900774_InfoGovernance_accv2.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KishUnpatients15-34\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KishUnpatients15_34-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Kish, L.J.; Topol, E.J. (2015). \"Unpatients: Why patients should own their medical data\". <i>Nature Biotechnology<\/i> <b>33<\/b> (9): 921\u20134. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnbt.3340\" target=\"_blank\">10.1038\/nbt.3340<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26348958\" target=\"_blank\">26348958<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Unpatients%3A+Why+patients+should+own+their+medical+data&rft.jtitle=Nature+Biotechnology&rft.aulast=Kish%2C+L.J.%3B+Topol%2C+E.J.&rft.au=Kish%2C+L.J.%3B+Topol%2C+E.J.&rft.date=2015&rft.volume=33&rft.issue=9&rft.pages=921%E2%80%934&rft_id=info:doi\/10.1038%2Fnbt.3340&rft_id=info:pmid\/26348958&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ECCross14-35\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ECCross14_35-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/ec.europa.eu\/digital-single-market\/en\/news\/cross-border-health-project-epsos-what-has-it-achieved\" target=\"_blank\">\"Cross-border health project epSOS: What has it achieved?\"<\/a>. <i>Digital Single Market<\/i>. European Commission. 07 July 2014<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/ec.europa.eu\/digital-single-market\/en\/news\/cross-border-health-project-epsos-what-has-it-achieved\" target=\"_blank\">https:\/\/ec.europa.eu\/digital-single-market\/en\/news\/cross-border-health-project-epsos-what-has-it-achieved<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 05 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Cross-border+health+project+epSOS%3A+What+has+it+achieved%3F&rft.atitle=Digital+Single+Market&rft.date=07+July+2014&rft.pub=European+Commission&rft_id=https%3A%2F%2Fec.europa.eu%2Fdigital-single-market%2Fen%2Fnews%2Fcross-border-health-project-epsos-what-has-it-achieved&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-36\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-36\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.bundesgesundheitsministerium.de\/en\/health\/e-health-act.html\" target=\"_blank\">\"Act on secure digital communication and applications in the health care system (E-Health Act)\"<\/a>. Federal Ministry of Health. 29 September 2015<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.bundesgesundheitsministerium.de\/en\/health\/e-health-act.html\" target=\"_blank\">http:\/\/www.bundesgesundheitsministerium.de\/en\/health\/e-health-act.html<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 05 February 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Act+on+secure+digital+communication+and+applications+in+the+health+care+system+%28E-Health+Act%29&rft.atitle=&rft.date=29+September+2015&rft.pub=Federal+Ministry+of+Health&rft_id=http%3A%2F%2Fwww.bundesgesundheitsministerium.de%2Fen%2Fhealth%2Fe-health-act.html&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DoveResearch16-37\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DoveResearch16_37-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Dove, E.S.; Townend, D.; Meslin, E.M. et al. (2016). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4838154\" target=\"_blank\">\"Research Ethics: Ethics review for international data-intensive research\"<\/a>. <i>Science<\/i> <b>351<\/b> (6280): 1399\u2013400. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1126%2Fscience.aad5269\" target=\"_blank\">10.1126\/science.aad5269<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4838154\/\" target=\"_blank\">PMC4838154<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/27013718\" target=\"_blank\">27013718<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4838154\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4838154<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Research+Ethics%3A+Ethics+review+for+international+data-intensive+research&rft.jtitle=Science&rft.aulast=Dove%2C+E.S.%3B+Townend%2C+D.%3B+Meslin%2C+E.M.+et+al.&rft.au=Dove%2C+E.S.%3B+Townend%2C+D.%3B+Meslin%2C+E.M.+et+al.&rft.date=2016&rft.volume=351&rft.issue=6280&rft.pages=1399%E2%80%93400&rft_id=info:doi\/10.1126%2Fscience.aad5269&rft_id=info:pmc\/PMC4838154&rft_id=info:pmid\/27013718&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4838154&rfr_id=info:sid\/en.wikipedia.org:Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In several cases the PubMed ID was missing and was added to make the reference more useful. \n<\/p><p>Per the distribution agreement, the following copyright information is also being added: \n<\/p><p>\u00a9John Mark Michael Rumbold, Barbara Pierscionek. Originally published in the Journal of Medical Internet Research (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.jmir.org\" target=\"_blank\">http:\/\/www.jmir.org<\/a>), 24.02.2017.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20181214191106\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.776 seconds\nReal time usage: 0.803 seconds\nPreprocessor visited node count: 26615\/1000000\nPreprocessor generated node count: 37830\/1000000\nPost\u2010expand include size: 200296\/2097152 bytes\nTemplate argument size: 71384\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 786.644 1 - -total\n 87.12% 685.330 1 - Template:Reflist\n 75.66% 595.168 37 - Template:Citation\/core\n 42.80% 336.701 22 - Template:Cite_web\n 34.69% 272.899 14 - Template:Cite_journal\n 8.10% 63.681 1 - Template:Infobox_journal_article\n 7.74% 60.879 1 - Template:Infobox\n 5.05% 39.749 27 - Template:Citation\/identifier\n 4.68% 36.821 80 - Template:Infobox\/row\n 4.63% 36.389 47 - Template:Citation\/make_link\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:9966-0!*!*!!en!*!* and timestamp 20181214191105 and revision id 30252\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research\">https:\/\/www.limswiki.org\/index.php\/Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","35171859a8e80fe1a0d916059f4fdd3e_images":[],"35171859a8e80fe1a0d916059f4fdd3e_timestamp":1544814665,"bbfbe3553b26be64d63e45d26612ea45_type":"article","bbfbe3553b26be64d63e45d26612ea45_title":"Deployment of analytics into the healthcare safety net: Lessons learned (Hartzband and Jacobs 2016)","bbfbe3553b26be64d63e45d26612ea45_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned","bbfbe3553b26be64d63e45d26612ea45_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Deployment of analytics into the healthcare safety net: Lessons learned\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nDeployment of analytics into the healthcare safety net: Lessons learnedJournal\n \nOnline Journal of Public Health InformaticsAuthor(s)\n \nHartzband, David; Jacobs, FeygeleAuthor affiliation(s)\n \nRCHN Community Health FoundationPrimary contact\n \nEmail: dhartzband at rchnfoundation dot orgYear published\n \n2016Volume and issue\n \n8(3)Page(s)\n \ne203DOI\n \n10.5210\/ojphi.v8i3.7000ISSN\n \n1947-2579Distribution license\n \nCreative Commons Attribution-NonCommercial 3.0 UnportedWebsite\n \nhttp:\/\/ojphi.org\/ojs\/index.php\/ojphi\/article\/view\/7000Download\n \nhttp:\/\/ojphi.org\/ojs\/index.php\/ojphi\/article\/download\/7000\/5812 (PDF)\n\nContents\n\n1 Abstract \n2 Introduction \n3 Background and literature \n4 Methods \n5 Study limitations \n6 Results \n7 Conclusions and recommendations \n8 Acknowledgements \n9 Conflicts of interest \n10 Footnotes \n11 References \n12 Notes \n\n\n\nAbstract \nBackground: As payment reforms shift healthcare reimbursement toward value-based payment programs, providers need the capability to work with data of greater complexity, scope and scale. This will in many instances necessitate a change in understanding of the value of data and the types of data needed for analysis to support operations and clinical practice. It will also require the deployment of different infrastructure and analytic tools. Community health centers (CHCs), which serve more than 25 million people and together form the nation\u2019s largest single source of primary care for medically underserved communities and populations, are expanding and will need to optimize their capacity to leverage data as new payer and organizational models emerge.\nMethods: To better understand existing capacity and help organizations plan for the strategic and expanded uses of data, a project was initiated that deployed contemporary, Hadoop-based, analytic technology into several multi-site CHCs and a primary care association (PCA) with an affiliated data warehouse supporting health centers across the state. An initial data quality exercise was carried out after deployment, in which a number of analytic queries were executed using both the existing electronic health record (EHR) applications and in parallel, the analytic stack. Each organization carried out the EHR analysis using the definitions typically applied for routine reporting. The analysis deploying the analytic stack was carried out using those common definitions established for the Uniform Data System (UDS) by the Health Resources and Service Administration.[a] In addition, interviews with health center leadership and staff were completed to understand the context for the findings. \nResults: The analysis uncovered many challenges and inconsistencies with respect to the definition of core terms (patient, encounter, etc.), data formatting, and missing, incorrect and unavailable data. At a population level, apparent under-reporting of a number of diagnoses, specifically obesity and heart disease, was also evident in the results of the data quality exercise, for both the EHR-derived and stack analytic results. \nConclusion: Data awareness \u2014 that is, an appreciation of the importance of data integrity, data hygiene[b] and the potential uses of data \u2014 needs to be prioritized and developed by health centers and other healthcare organizations if analytics are to be used in an effective manner to support strategic objectives. While this analysis was conducted exclusively with community health center organizations, its conclusions and recommendations may be more broadly applicable.\nKeywords: Community health centers, analytics, decision-making, data\n\nIntroduction \nCommunity health centers are the backbone of the health care safety net, providing comprehensive primary care for the nation\u2019s medically underserved communities and populations. In 2015, 1,429 community health centers operated in nearly 10,000 urban and rural sites across the country, serving over 25 million people. Buoyed by HRSA\u2019s long-standing focus on quality improvement and substantial investments in health center HIT systems, health center organizations have implemented electronic health record applications in record numbers. Ninety-two percent of all federally qualified community health centers, and 85 percent of health center \u201clook-alikes\u201d \u2014 those entities that meet all requirements of the health center program but are supported by state and local funds rather than federal grants \u2014 report that an EHR was in use for all sites and all providers in 2015; only 2.4 percent have no EHR installed at any site, and virtually all expect to adopt an EHR. In addition, 95.5 percent report using clinical decision support applications, and 64.1 percent exchange clinical information electronically with other key providers, health care settings or subspecialty clinicians.[c] In addition, 88.9 percent participate in the Centers for Medicare and Medicaid Services (CMS) EHR Incentive Program commonly known as \"Meaningful Use.\" These statistics reflect a commitment to the adoption of new technologies to support the provision of high-quality clinical care and streamline operations. Yet as the movement to value-based payment accelerates and strategic planning becomes more complex, community health center organizations, along with all other providers, must be prepared for new and increasingly sophisticated analytics to support clinical care and operations.\nAs analytics are applied to ever-larger amounts of data and become both more important and more necessary, questions about their use become inevitable. How is data quality influenced by the use of health information technology (HIT) such as electronic health records (EHR), or acquisition through other means? On an operational level, how can analytic results best be understood and used to address and improve healthcare practice? Patient outcomes? Cost reduction? What are the implications of problematic data quality on operational capacity?[d]\nTo address these questions and help community health center organizations plan for future use and integration of contemporary analytics, several health center organizations were recruited to engage in a project to evaluate: \n\n Health center data accuracy: Do health center data systems ensure correct values and consistent formats for data?\n Health center data reliability: Do health center data systems collect and report results that are consistent and correspond to results from CDC data sources?\n Health center data completeness: Do health center data meet the criteria for all mandatory data items?\nAt each participating organization, which included several community health centers and one state primary care association, a Hadoop-based analytic stack was deployed alongside the organization\u2019s other data systems. Population-level statistics were compared for specific diagnoses and comorbidities calculated through the organization\u2019s normal means and through the analytic stack for comparability and utility.\n\nBackground and literature \nDocumentation, reporting accuracy and data quality have been the focus of numerous studies. Yang and Colditz[4] recently undertook a review of NHANES survey data in an effort to benchmark the prevalence of obesity nationally. Al Kazzi et al.[5] examined the prevalence of obesity and tobacco and alcohol use, comparing the data in a direct survey (the Behavioral Risk Factor Surveillance System - BRFSS) with that in the Nationwide Inpatient Sample administrative database, finding substantial differences between the two. O\u2019Malley et al.[6] examined the ICD diagnostic coding process and potential sources of error in code accuracy. They found the principal sources of error to be related to both communication and documentation, citing lack of baseline information, communication errors, physician familiarity and experience with the presenting condition, and insufficient attention to detail, as well as training and experience of coders and discrepancies between electronic and paper record systems. Their prescription for improvement was the specification of clear coding processes and a focus on heightening the awareness of all staff engaged in documentation with respect to data quality.\nDevoe, et al.[7] compared the entries in EHRs with the same data in the Medicaid claims data set for a group of 50 community health centers in Oregon. They found gaps in data congruence across the study group, with some services documented in the Medicaid data set but not the EHRs, and others documented in the EHRs but not in the Medicaid data set. For the latter group, nearly 50 percent of services documented in the EHR were not found in the Medicaid claims for HbA1c, cholesterol screening, retinopathy screening and influenza vaccination. They also evaluated demographic characteristics and found that Spanish speaking patients, as well as those who had gaps in insurance coverage, were more likely to have services documented in the EHR but not in the Medicaid claims data, a finding especially relevant to community health centers, which disproportionately serve poor, uninsured individuals and those best served in languages other than English.[e]\nOutside of health care, other industries \u2014 including discrete manufacturing and financial services \u2014 have struggled with the overall issue of data quality.[8] Over time, both of these industries have, for the most part, achieved very high levels of data quality and high levels of user confidence in their data, and the experience in these industries might provide some insight into data quality improvement in healthcare.[f] Two projects are especially instructive in this area. The C4 Project at General Motors, which began in 1986, was an attempt to develop an entirely paperless design and manufacturing specification system for automotive manufacturing.[9] The data quality effort associated with this project was immense. A staff of close to 50 people was assigned to the various parts of data acquisition, normalization, maintenance and life cycle management. The project emphasized the design of processes to ensure data quality and integrity. In particular, data governance was monitored at least as much as data entry, storage and usage. This went a long way toward ensuring a high level of data quality.\nThe same was true of a project at Goldman Sachs to develop an integrated trading system. The primary effort was the development of a set of data governance and data life cycle processes that focused on the awareness of data quality.[g]\nThe studies in other industries point to the processes deployed and potential for change relevant to the health care industry, while the studies of health services data frame some of the known challenges and complexities that remain in the health care industry. For many community health centers, which serve especially vulnerable populations, and tend to be less well-resourced than hospital providers, understanding the complexities related to data collection and use, and developing appropriate strategies to improve data collection and use information more effectively, remain challenging.\n\nMethods \nTwo urban and one rural multi-site community health center, operating in three different states, were recruited along with one state primary care association to participate in the study in 2014\u20132015. The three health centers varied in size and in the aggregate served approximately 124,000 patients. The participation of the primary care association, which administers a data warehouse for member health center organizations, resulted in a total data set representing 50 CHCs, operating more than 400 practice sites and serving approximately 1.3 million patients. The overall distribution of urban and rural sites was approximately equal. Each organization made available data for a period of either two years (2012-2014) or three years (2011-2014). It should be noted that the deployment was undertaken specific to each site; that is, each location was treated as a unique project site or case.\nAt each project site, a dual-path deployment and analytic process was used, wherein researchers worked with the IT group local to the organization to install and integrate a new software application to provide analytic capacity, alongside the centers\u2019 main systems.[h] The purpose of this was to help each health enter assess the reliability of its existing systems in deriving results consistent with the new application. This software, an open-source (OSS) Hadoop-based analytic stack, consisted of the Cloudera Express Hadoop distribution that includes the Hadoop Distributed File System (HDFS), Yarn (MapReduce2), Hbase (non-relational data management) and Impala (SQL-based query). Hadoop was selected as the analytic system for several reasons: it is a well-defined and well-understood technology; it is in current use in many sectors; it is relatively easy to install and test; it provides the opportunity to manage and analyze data from very heterogeneous sources; and it is easy to use because analytic queries may be produced in SQL. Most significant is the use of Hadoop for managing Big Data, and for predictive analytics. These are important considerations for health centers as they expand in capacity and require increasingly complex tools for clinical and operational management. The local deployments provided an opportunity to test their use in the health center environment.\nThe OSS installation took between one week and three months, with the duration depending largely on the familiarity of the organization\u2019s IT staff with deploying open-source software. After deployment of this software, a read-only connection was made either to the database underlying the organization\u2019s electronic health record (EHR) system, or to a data extract or warehouse maintained by the primary care association and one of the urban community health centers. \nThe dual-path process provided for the comparison and separate evaluation of results and data quality for two different data collection and analysis approaches. The first method accessed and analyzed data through the normal processes in use at the health center. These processes included direct access to the EHR or EHR-based data warehouse and analysis through either the EHR\u2019s query facility or the business intelligence (BI) tool in use at the health center. The second method consisted of extracting data from the EHR (or warehouse), normalizing the data according to standard (UDS) definitions and conducting analysis using the Hadoop-based stack. This method allowed for greater transparency and permitted data quality issues such as differences in definitions or ambiguity due to EHR complexity to be identified and addressed.\nAfter data were imported into the analytic stack and deployment was completed, an initial \u201clevelup\u201d exercise was performed. This exercise served both to test the analytic system and to facilitate the normalization of all core terms and data definitions between the CHC\u2019s operational systems and the analytic stack. The \u201clevel-up\u201d exercise consisted of a number of defined queries performed through the organization\u2019s regular systems (EHR, SQL, BI tools) and compared with the same queries performed on the analytic stack with the data in the HDFS\/HBase information store. The following queries were performed[i]:\n\n number of patients served, per year\n number of patients served presenting with specific diagnoses including hypertension, diabetes, obesity, heart disease, and behavioral health conditions\n rank order of prevalent[j] comorbidities\n cost[k] per patient, per year\n cost, per comorbidity, per year\nThis exercise, undertaken by the organizations in concert with the researchers, took between two weeks and six months to perform, and was largely dependent upon the organization\u2019s prior work on data normalization.\nIn parallel with this technical deployment, training was provided to each organization\u2019s IT personnel and other staff, including in all cases the CEO or executive director. The training\nfocused on the uses of the stack-driven analytics, an exploration of its advantages and disadvantages, and addressed how it differed from extant business intelligence and other reporting.\nLastly, informal interviews were conducted with the chief medical officer or CEO at each organization to review results, discuss important findings, and consider potential challenges and\napproaches to continued analysis.\n\nStudy limitations \nThe analysis was confined to community health center organizations and included organizations operating in just three states. It may not be representative of a broader group of health centers nationally or of CHCs in different states. The data focused on specific years and included centers that had undergone an EHR migration prior to the analysis period, which is an experience that may not be shared by health centers generally. Despite these limitations, the data quality issues identified among the participants provide evidence of common data concerns and challenges.\n\nResults \nThe exercise revealed common data quality issues across each of the organizations. These include missing or unusable data, as well as differences between the definition of core terms such as \"patient\" or \"encounter,\" both within and across organizations even though these terms have standard definitions as required by HRSA. Finally, under-reporting of certain diagnoses in comparison to the general population raises questions about the reliability of the data. While interesting in and of themselves, these initial results are important for what they may indicate about the bigger picture of data acquisition and use. \nTable 1 below illustrates the values reported by the participating organizations, presented as ranges, for key diagnoses. As noted above, these results represent data from approximately 50 CHCs comprising over 400 clinical sites and a total of 1.3 million patients, for a period of two to three years. Reported population percentages (CDC FastStats) for the U.S. population as a whole are presented for comparison.\n\n\n\n\n\n\n\nTable 1. Population percent range values for selected diagnoses; * indicates outlying data.\n\n\nDiagnosis\n\nRange from EHR Values\n\nRange from Analytic Values\n\nU.S. Population Percentage (CDC)[l]\n\n\nHypertension\n\n17%-23%\n\n4%*-22%\n\n33.5%\n\n\nDiabetes\n\n6%-8%\n\n2%*-8%\n\n9.6%\n\n\nObesity\n\n3%*-12%\n\n3%*-12%\n\n37.9%\n\n\nHeart disease\n\n1%-4%\n\n1%-3%\n\n11.5%\n\n\n\nSeveral issues are evident in the results reported above. The first is that the hypertension, diabetes and obesity results include a number of outliers (marked *). In each case, the outlier data are attributable to one (out of >40) health center organization. If these data are discarded, the comparative ranges for the results derived from EHR data are similar to the results derived from the data imported into the analytic stack. The table below shows the effect of removing the outlier organization from the analytic stack results. With the outlier removed, the EHR and stack-driven results are closer, but there is still some variation illustrated by the data ranges, reflecting inconsistencies in the data.\n\n\n\n\n\n\n\nTable 2. Adjusted population percent range values for selected diagnoses.\n\n\nDiagnosis\n\nRange from EHR Values\n\nRange from Analytic Values\n\nU.S. Population Percentage (CDC)\n\n\nHypertension\n\n17%-23%\n\n17%-22%\n\n33.5%\n\n\nDiabetes\n\n6%-8%\n\n5%-8%\n\n9.6%\n\n\nObesity\n\n9%-10%\n\n7%-12%\n\n37.9%\n\n\nHeart disease\n\n1%-4%\n\n1%-3%\n\n11.5%\n\n\n\nIn addition, the percentage of the population with obesity and heart disease diagnoses, in both the full data table and second, adjusted, table are notably low in comparison with the CDC\u2019s reported figures for the U.S. population as a whole. We might expect the population percentages for these diagnoses in the community health center patient population to be at least the same as, if not higher than, those in the general population, given the documented level of disparities and characteristics of the population served. Possible reasons for these discrepancies will be discussed in the conclusions section of this paper. \nMore generally, the analysis revealed several types of potential data quality issues. Although the nature and extent of the problems varied across sites, the problems \u2014 including definition conflicts, conversion issues and structural challenges \u2014 were not unique to any site and to some extent were evident at all sites. These include:\n\n errors resulting from deviation from standard definitions (for example, for patient, encounter, provider) even when guidelines for such definitions exist and are required for standard reporting (UDS, in the case of CHCs);\n errors caused by omission: that is, data simply not recorded;\n errors resulting from incorrect entry, including:\n\u25aa values that are out-of-range and not caught by the EHR system, e.g., BMIs of >1000, BP values of 320\/250, HbA1c values of >50;\n\u25aa incorrect text entered for names, addresses, previous providers;\n\u25aa values not entered into searchable fields;\n\u25aa data recorded as text in clinical notes but not into searchable fields;\n\u25aa data imported from external sources (labs, registries, etc.) as text but not into searchable fields;\n errors resulting from the structure and complexity of EHR systems, including the following problems:\n\u25aa Several systems were found to be sensitive to the form the data was entered, specifically ICD-9 codes of 250., 250.0, and 250.00, which resulted in different query results, as did 250.5, 250.50 etc.\n\u25aa Complexity of navigation and misalignment with provider workflows also appeared to be responsible for several types of errors.\n\u25aa Concentration on treatment of a single condition during an encounter led to low numbers of encounters with multiple diagnoses recorded; and\n data corruption and\/or loss of data resulting from migration to a new EHR platform.\nConclusions and recommendations \nThe length of time required to successfully complete the \u201clevel-up\u201d exercise was substantially shorter in those organizations that had done extensive data normalization work prior to beginning the study. The organizations (PCA and one large, urban CHC) that took the least time (less than one month) to deploy the analytic stack and perform the data quality exercise had previously undertaken substantial work to standardize definitions (semantic normalization) and to do format matching and format transformations (syntactic normalizations). This effort was not related to the study and was in all cases done in conjunction with the creation and population of a data warehouse. In addition, these organizations had already begun an exploration of analytics that enabled them to quickly align with the deployment requirements and to think in terms of strategic analysis. Conversely, the organization requiring the longest period of time to complete the level-up exercise had the most widespread use of idiosyncratic, non-standard (i.e., non-UDS) definitions for core terms such as patient, encounter, and provider, as well as definition mismatches between different clinical departments or between clinical and administrative departments within the organization. \nThe potential under-reporting of key diagnoses, as evidenced in the data for hypertension, diabetes, obesity and heart disease falling well below nationally reported figures, is of a different nature. The patient populations of community health centers are not generally thought to be healthier than the general U.S. population. Health centers' patients are disproportionately poor, uninsured, and publicly insured, and they are disproportionately members of minority groups.[m] In addition, health centers are more likely to treat patients with chronic illnesses compared to other primary care physicians.[n] Yet in all cases, the reported percentages for key diagnoses were below the values reported for the population as a whole, and they are especially conspicuous for obesity and heart disease.\nAl Kazzi et al.[5] recently compared hospital discharge data reported in the U.S. Inpatient Reporting Sample (NIS, AHRQ) to interview data reported in the Behavioral Risk Factor Surveillance System (BRFSS, CDC) for 2011 data. Results for obesity showed a 9.6 population percentage in the NIS and 27.4 population percentage in the BRFSS. The population percentages reported in the BRFSS figures, which are based on direct participant surveys, are thus almost three times greater than the results from hospital discharge records[o], and more aligned with other recent results.[p] This suggests that the CHC-reported data are consistent with other provider-reported data, as demonstrated by the NIS results, but understated relative to other sources.\nTo better understand the anomaly with respect to obesity in the health center data sets, these results were reviewed with the chief medical officers and other clinical staff at participating CHCs. Those interviewed estimated the obesity rate for the patients they served at 40 percent. A recent paper in the Journal of the American Medical Association, Internal Medicine[4] estimated that in the United States, 40 percent of adult men and 30 percent of adult women are overweight, while 35 percent of men and 37 percent of women are obese. The estimate provided by the participant CMOs is thus consistent with this data and substantially higher than the data derived from the analysis. \nCMOs interviewed cited two possible explanations for this. First, it was noted that providers did not often diagnose obesity, and when they did, they did not use the full range of ICD-9 codes, which include three specific codes (278, unspecified obesity; 278.01, morbid obesity, BMI >30; and 278.02, overweight, BMI >25). Further, while the UDS guidelines specify the use of the 22 V-codes for obesity, with a highly specific breakdown of BMI measurements, these apparently are also underutilized. It was conjectured that the data might reflect sensitivity to different cultural norms for defining obesity and being overweight in the communities served.[q] While more investigation needs to be done to understand the data anomaly, the range of three to 12 percent reported by the health center organizations in this study seems unlikely and could reflect both reporting and recording bias, as well as data quality issues.\nObesity might be subjective (although BMI values are a typically-used standard), but heart disease is a specific diagnosable occurrence. The apparent under-reporting of heart disease in the study group \u2014 approximately seven to eight percent, as compared to 11 percent nationally per the CDC \u2014 is therefore harder to explain. Most CMOs thought that 20 to 30 percent of their patients experienced some form of heart disease. Possible causes of under-reporting are still under investigation, although it should be said that our analysis was not age-adjusted. It was also not adjusted for the fact that, particularly in 2014, many patients not previously known to the health centers were seen for the first time as coverage expanded, and the addition of new patients may affect the distribution of diagnoses in ways that we do not yet understand.\nComparing the body of data quality work in aerospace and financial services industries with that in healthcare can be instructive. Each of the GM and Goldman Sachs projects referenced had several similarities besides emphasizing data governance. These included: 1) high level executive sponsorship \u2013 an EVP and\/or CEO who actually participated in introducing and reviewing the projects; 2) a long period of pre-work during which core terms were defined, data was normalized, and workflows and work processes were redesigned or newly created in order to provide an environment that promoted data quality; 3) broad participation from across the organization, not just IT; and 4) emphasis on standards where necessary or productive, but not as the primary or sole focus of effort. The most important characteristics in these industries\u2019 efforts were term definition, data normalization and process redesign, as well as broad participation in the entire effort across the phases of planning, initiation, deployment and ongoing improvement. These are industries that spent decades re-engineering their workflows and processes for operational and informational efficiency and effectiveness.[13][14]\nIn contrast, our experience suggests that: 1) governance and information life cycle are not at the core of how the healthcare industry approaches such projects; 2) while executive sponsorship is the norm, executive participation is rare; 3) many projects are designed, led and carried out by the IT group; and 4) standards are seen as a major part of the solution, including by federal agencies and regulators (i.e., ONC, CMS, HRSA). These issues remain to be tackled in healthcare organizations.\nThis leaves us with the larger questions that were mentioned earlier. Our current findings provide some indications of the influences on data quality that might explain, at least in part, the variations and unexpected results. These influences include: quality degradation from system migration; inadequate or inappropriate data entry causing missing or incorrect data; inaccessible data in text entries from provider notes, text lists and external text imports; inadequate definition and format normalization resulting in unusable data; systemic errors due to current practice norms; idiosyncrasies in how different EHRs process diagnosis codes; and complexity of navigation in EHRs. Many of these issues can at least be addressed by greater attention to detail at the data entry stage. Improving data quality directly in the EHR is more effective than trying to address it after the data are entered. The achievement of the Quadruple Aim \u2014 which encompasses improving the work life of health care providers, clinicians and staff as well as enhancing patient experience, improving population health, and reducing costs \u2014 clearly necessitates early preparation and consistent attention to data quality.\nSeveral recommendations are suggested by these results:\n\n Definitions of core terms should be reviewed and consensus reached on their application and use. Moreover, data definitions and workflows should be aligned with standard practices.\n Workflows and other processes should be reviewed and redesigned as necessary to emphasize and promote data quality.\n Organizations should familiarize themselves with how their EHR processes data as it is entered (for example, how diagnosis codes are treated), and ensure that entered data is treated consistently by the EHR.\n Text data should be entered in a consistent manner that is retrievable for analysis as well as for use in diagnosis and patient care.\n Before migrating from one EHR platform to another, data should be cleaned and checked. Extensive data checking should also be done after EHR system migration. Care must be taken that the data from the retired system is backed up, potentially in a data extract, so that is available if any conversion loss occurs, and to vet the integrity of the migration process.\nThese recommendations are aimed at helping the health centers to answer two strategic questions. First, how good are your data? Clinical data from the EHR, data imported from labs and other providers, and financial data need to be carefully reviewed and vetted for accuracy, reliability and completeness. Second, how good are your systems? This includes infrastructure (servers, storage, network) and software systems and applications as well as processes and workflows. Finally, it includes staffing and staff training and engagement. Supporting health centers with the expertise and funds to undertake this work should be prioritized.\nTo date, the participating organizations have not moved substantially beyond the initial data quality exercise. The issue remains of how to integrate analytic results into the strategic and operational practice of a community health center, or a healthcare organization in general. The experience from this study indicates that while community health centers generally attain the highest standards of care and achieve good outcomes with respect to both quality and cost, considerable work may be needed to help all centers strengthen their awareness of data and information quality and move toward better integration of analytic results in practice. This awareness includes understanding how to perform complex analytic queries, and what applications might best be suited to particular types of analysis, but these are just two components of a forward\u2013thinking data strategy. All staff at healthcare organizations, but particularly those engaged in using data for decision-making, must develop an awareness and appreciation of what data are available, how it can best be analyzed and how these results relate to the clinical, operational and strategic needs of the organization.\n\nAcknowledgements \nThe authors would like to acknowledge the assistance of Srini Rao, Ph.D., CEO of Datycs for deployment and analytic support.\n\nConflicts of interest \nThe authors have no conflicts of interest to report.\n\nFootnotes \n\n\n\u2191 As defined in Health Resources and Services Administration's Bureau of Primary Health Care, UDS Reporting Instructions for Health Centers, 2014 Edition (PDF) \n\n\u2191 \"Data hygiene is the collective processes conducted to ensure the cleanliness of data. Data is considered clean if it is relatively error-free.\" \n\n\u2191 See HRSA's 2015 Health Center Data, Table 5 - Staffing and Utilization \n\n\u2191 c.f. Nambiar, et al. 2013[1]; Raghupathi, et al. 2013[2] and Ward, et al. 2014[3] \n\n\u2191 National Association of Community Health Centers' A Sketch of Community Health Centers, August 2016 (PDF) \n\n\u2191 Study author Dr. Hartzband has had extensive experience in these industries, including external architect for the General Motors C4 project \u2014 an effort to develop a paperless design process for car manufacturing \u2014 and as a principal consultant to Ernst & Young for the Goldman Sachs integrated trading system effort. \n\n\u2191 Via personal correspondence between Dr. Hartzband and Goldman Sachs team. \n\n\u2191 Designed in accordance with various reviews of healthcare data quality assessment, especially:\nKahn, et al. 2012[10]; Weiskopf and Weng, 2013[11] and Cai and Zhu, 2015[12] \n\n\u2191 Uniform Data System definitions are used for all terms, including visits, patients and conditions (hyperlink field removed). \n\n\u2191 Data quality and access issues prevented the accurate calculation of comorbidities. \n\n\u2191 Actual cost (expenditure), not billed cost (revenue). It is important to note that actual cost was not able to be calculated at any of the health centers, and so these queries were not run. \n\n\u2191 Figures from updated National Center for Health Statistics, CDC Fast Stats (2013-2014) for adults over 40 years of age. CDC definitions for diagnosis are identical to those used by HRSA for UDS except for heart disease, where the UDS definitions encompass more codes and therefore more conditions. \n\n\u2191 See HRSA's 2015 Health Center Data, Table 4 - Selected Patient Characteristics \n\n\u2191 See NACHC's 2014 Chart Book, Figure 1.9 (PDF) \n\n\u2191 NIS: Discharge level data from approximately 8 million hospital stays (2011); BRFSS: 506,467 adult participants (2011) \n\n\u2191 NIS: Discharge level data from approximately 8 million hospital stays (2011); BRFSS: 506,467 adult participants (2011) \n\n\u2191 Several chief medical officers suggested that cultural norms for what is considered obesity vary greatly among communities, and that providers might be unwilling to make a diagnosis not in line with such norms. \n\n\nReferences \n\n\n\u2191 Nambiar, R.; Bhardwaj, R.; Sethi, A. et al. (2013). \"A look at challenges and opportunities of Big Data analytics in healthcare\". 2013 IEEE International Conference on Big Data 2013. doi:10.1109\/BigData.2013.6691753.   \n\n\u2191 Raghupathi, W.; Raghupathi, V. (2013). \"An overview of health analytics\". Journal of Health & Medical Informatics 4: 132. doi:10.4172\/2157-7420.1000132.   \n\n\u2191 Ward, M.J.; Karsolo, K.A.; Froehle, C.M. (2014). \"Applications of business analytics in healthcare\". Business Horizons 57 (5): 571\u2013582. doi:10.1016\/j.bushor.2014.06.003. PMC PMC4242091. PMID 25429161. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4242091 .   \n\n\u2191 4.0 4.1 Yang, L.; Colditz, G.A. (2015). \"Prevalence of overweight and obesity in the United States, 2007-2012\". JAMA Internal Medicine 175 (8): 1412\u20133. doi:10.1001\/jamainternmed.2015.2405. PMC PMC4625533. PMID 26098405. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4625533 .   \n\n\u2191 5.0 5.1 Al Kazzi, E.S.; Lau, B.; Li, T. et al. (2015). \"Differences in the prevalence of obesity, smoking and alcohol in the United States Nationwide Inpatient Sample and the Behavioral Risk Factor Surveillance System\". PLoS One 10 (11): e0140165. doi:10.1371\/journal.pone.0140165. PMC PMC4633065. PMID 26536469. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4633065 .   \n\n\u2191 O'Malley, K.J.; Cook, K.F.; Price, M.D. et al. (2005). \"Measuring diagnoses: ICD code accuracy\". Health Services Research 40 (5 Pt. 2): 1620\u201339. doi:10.1111\/j.1475-6773.2005.00444.x. PMC PMC1361216. PMID 16178999. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1361216 .   \n\n\u2191 Devoe, J.E.; Gold, R.; McIntire, P. et al. (2011). \"Electronic health records vs Medicaid claims: Completeness of diabetes preventive care data in community health centers\". Annals of Family Medicine 9 (4): 351\u20148. doi:10.1370\/afm.1279. PMC PMC3133583. PMID 21747107. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3133583 .   \n\n\u2191 O'Connor, L. (May 2007). \"Data Quality Management and Financial Services\". Proceedings of the MIT 2007 Information Quality Industry Symposium. http:\/\/mitiq.mit.edu\/IQIS\/Documents\/CDOIQS_200777\/Papers\/01_59_4E.pdf .   \n\n\u2191 Bliss, F.W.. \"The C4 Program at General Motors\". In Machover, C.. The CAD\/CAM Handbook. McGraw-Hill, Inc. pp. 309\u2013320. ISBN 0070393753.   \n\n\u2191 Kahn, M.G.; Raebel, M.A.; Glanz, J.M. et al. (2012). \"A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research\". Medical Care 50 (Suppl.): S21\u20139. doi:10.1097\/MLR.0b013e318257dd67. PMC PMC3833692. PMID 22692254. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3833692 .   \n\n\u2191 Weiskopf, N.G.; Weng, C. et al. (2013). \"Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research\". JAMIA 20 (1): 144-51. doi:10.1136\/amiajnl-2011-000681. PMC PMC3555312. PMID 22733976. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3555312 .   \n\n\u2191 Cai, L.; Zhu, Y. (2015). \"The challenges of data quality and data quality assessment in the big data era\". Data Science Journal 14: 2. doi:10.5334\/dsj-2015-002.   \n\n\u2191 Hammer, M. (July-August 1990). \"Reengineering Work: Don\u2019t Automate, Obliterate\". Harvard Business Review. pp. 104\u201312. https:\/\/hbr.org\/1990\/07\/reengineering-work-dont-automate-obliterate .   \n\n\u2191 Hammer, M.; Champy, J.A. (2001). Reengineering the Corporation: A Manifesto for Business Revolution. Harper Business Books. pp. 272. ISBN 0066621127.   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. To more easily differentiate footnotes from references, the original footnotes (which where numbered) were updated to use lowercase letters. The citation information for the first reference was incorrect and has been updated. The URL to the NACHC's 2014 Chart Book was broken, and a current URL was substituted.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\">https:\/\/www.limswiki.org\/index.php\/Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2017)LIMSwiki journal articles (all)LIMSwiki journal articles on public health informatics\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 4 January 2017, at 23:52.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 1,032 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","bbfbe3553b26be64d63e45d26612ea45_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Deployment_of_analytics_into_the_healthcare_safety_net_Lessons_learned skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Deployment of analytics into the healthcare safety net: Lessons learned<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p><b>Background<\/b>: As payment reforms shift healthcare reimbursement toward value-based payment programs, providers need the capability to work with data of greater complexity, scope and scale. This will in many instances necessitate a change in understanding of the value of data and the types of data needed for analysis to support operations and clinical practice. It will also require the deployment of different infrastructure and analytic tools. <a href=\"https:\/\/www.limswiki.org\/index.php\/Federally_qualified_health_center\" title=\"Federally qualified health center\" target=\"_blank\" class=\"wiki-link\" data-key=\"72b1961c7abe7b1850ad53a7ca0694d6\">Community health centers<\/a> (CHCs), which serve more than 25 million people and together form the nation\u2019s largest single source of primary care for medically underserved communities and populations, are expanding and will need to optimize their capacity to leverage data as new payer and organizational models emerge.\n<\/p><p><b>Methods<\/b>: To better understand existing capacity and help organizations plan for the strategic and expanded uses of data, a project was initiated that deployed contemporary, Hadoop-based, analytic technology into several multi-site CHCs and a primary care association (PCA) with an affiliated data warehouse supporting health centers across the state. An initial data quality exercise was carried out after deployment, in which a number of analytic queries were executed using both the existing <a href=\"https:\/\/www.limswiki.org\/index.php\/Electronic_health_record\" title=\"Electronic health record\" target=\"_blank\" class=\"wiki-link\" data-key=\"f2e31a73217185bb01389404c1fd5255\">electronic health record<\/a> (EHR) applications and in parallel, the analytic stack. Each organization carried out the EHR analysis using the definitions typically applied for routine reporting. The analysis deploying the analytic stack was carried out using those common definitions established for the Uniform Data System (UDS) by the Health Resources and Service Administration.<sup id=\"rdp-ebb-cite_ref-1\" class=\"reference\"><a href=\"#cite_note-1\" rel=\"external_link\">[a]<\/a><\/sup> In addition, interviews with health center leadership and staff were completed to understand the context for the findings. \n<\/p><p><b>Results<\/b>: The analysis uncovered many challenges and inconsistencies with respect to the definition of core terms (patient, encounter, etc.), data formatting, and missing, incorrect and unavailable data. At a population level, apparent under-reporting of a number of diagnoses, specifically obesity and heart disease, was also evident in the results of the data quality exercise, for both the EHR-derived and stack analytic results. \n<\/p><p><b>Conclusion<\/b>: Data awareness \u2014 that is, an appreciation of the importance of data integrity, data hygiene<sup id=\"rdp-ebb-cite_ref-2\" class=\"reference\"><a href=\"#cite_note-2\" rel=\"external_link\">[b]<\/a><\/sup> and the potential uses of data \u2014 needs to be prioritized and developed by health centers and other healthcare organizations if analytics are to be used in an effective manner to support strategic objectives. While this analysis was conducted exclusively with community health center organizations, its conclusions and recommendations may be more broadly applicable.\n<\/p><p><i>Keywords<\/i>: Community health centers, analytics, decision-making, data\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>Community health centers are the backbone of the health care safety net, providing comprehensive primary care for the nation\u2019s medically underserved communities and populations. In 2015, 1,429 community health centers operated in nearly 10,000 urban and rural sites across the country, serving over 25 million people. Buoyed by HRSA\u2019s long-standing focus on quality improvement and substantial investments in health center HIT systems, health center organizations have implemented electronic health record applications in record numbers. Ninety-two percent of all <a href=\"https:\/\/www.limswiki.org\/index.php\/Federally_qualified_health_center\" title=\"Federally qualified health center\" target=\"_blank\" class=\"wiki-link\" data-key=\"72b1961c7abe7b1850ad53a7ca0694d6\">federally qualified community health centers<\/a>, and 85 percent of health center \u201clook-alikes\u201d \u2014 those entities that meet all requirements of the health center program but are supported by state and local funds rather than federal grants \u2014 report that an EHR was in use for all sites and all providers in 2015; only 2.4 percent have no EHR installed at any site, and virtually all expect to adopt an EHR. In addition, 95.5 percent report using clinical decision support applications, and 64.1 percent exchange clinical <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" target=\"_blank\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a> electronically with other key providers, health care settings or subspecialty clinicians.<sup id=\"rdp-ebb-cite_ref-3\" class=\"reference\"><a href=\"#cite_note-3\" rel=\"external_link\">[c]<\/a><\/sup> In addition, 88.9 percent participate in the <a href=\"https:\/\/www.limswiki.org\/index.php\/Centers_for_Medicare_and_Medicaid_Services\" title=\"Centers for Medicare and Medicaid Services\" target=\"_blank\" class=\"wiki-link\" data-key=\"654b4449e4816e190325b420c264df1a\">Centers for Medicare and Medicaid Services<\/a> (CMS) EHR Incentive Program commonly known as \"Meaningful Use.\" These statistics reflect a commitment to the adoption of new technologies to support the provision of high-quality clinical care and streamline operations. Yet as the movement to value-based payment accelerates and strategic planning becomes more complex, community health center organizations, along with all other providers, must be prepared for new and increasingly sophisticated analytics to support clinical care and operations.\n<\/p><p>As analytics are applied to ever-larger amounts of data and become both more important and more necessary, questions about their use become inevitable. How is data quality influenced by the use of <a href=\"https:\/\/www.limswiki.org\/index.php\/Health_information_technology\" title=\"Health information technology\" target=\"_blank\" class=\"wiki-link\" data-key=\"9c8ef822470559f757db89f3fa234cc0\">health information technology<\/a> (HIT) such as electronic health records (EHR), or acquisition through other means? On an operational level, how can analytic results best be understood and used to address and improve healthcare practice? Patient outcomes? Cost reduction? What are the implications of problematic data quality on operational capacity?<sup id=\"rdp-ebb-cite_ref-7\" class=\"reference\"><a href=\"#cite_note-7\" rel=\"external_link\">[d]<\/a><\/sup>\n<\/p><p>To address these questions and help community health center organizations plan for future use and integration of contemporary analytics, several health center organizations were recruited to engage in a project to evaluate: \n<\/p>\n<ul><li> Health center data accuracy: Do health center data systems ensure correct values and consistent formats for data?<\/li>\n<li> Health center data reliability: Do health center data systems collect and report results that are consistent and correspond to results from CDC data sources?<\/li>\n<li> Health center data completeness: Do health center data meet the criteria for all mandatory data items?<\/li><\/ul>\n<p>At each participating organization, which included several community health centers and one state primary care association, a Hadoop-based analytic stack was deployed alongside the organization\u2019s other data systems. Population-level statistics were compared for specific diagnoses and comorbidities calculated through the organization\u2019s normal means and through the analytic stack for comparability and utility.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Background_and_literature\">Background and literature<\/span><\/h2>\n<p>Documentation, reporting accuracy and data quality have been the focus of numerous studies. Yang and Colditz<sup id=\"rdp-ebb-cite_ref-YangPrev15_8-0\" class=\"reference\"><a href=\"#cite_note-YangPrev15-8\" rel=\"external_link\">[4]<\/a><\/sup> recently undertook a review of NHANES survey data in an effort to benchmark the prevalence of obesity nationally. Al Kazzi <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-AlKazziDiff15_9-0\" class=\"reference\"><a href=\"#cite_note-AlKazziDiff15-9\" rel=\"external_link\">[5]<\/a><\/sup> examined the prevalence of obesity and tobacco and alcohol use, comparing the data in a direct survey (the Behavioral Risk Factor Surveillance System - BRFSS) with that in the Nationwide Inpatient Sample administrative database, finding substantial differences between the two. O\u2019Malley <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-O.27MalleyMeasuring05_10-0\" class=\"reference\"><a href=\"#cite_note-O.27MalleyMeasuring05-10\" rel=\"external_link\">[6]<\/a><\/sup> examined the <a href=\"https:\/\/www.limswiki.org\/index.php\/International_Statistical_Classification_of_Diseases_and_Related_Health_Problems\" title=\"International Statistical Classification of Diseases and Related Health Problems\" target=\"_blank\" class=\"wiki-link\" data-key=\"1de9af67005dfe2895e5d8cf6de57d4a\">ICD diagnostic coding process<\/a> and potential sources of error in code accuracy. They found the principal sources of error to be related to both communication and documentation, citing lack of baseline information, communication errors, physician familiarity and experience with the presenting condition, and insufficient attention to detail, as well as training and experience of coders and discrepancies between electronic and paper record systems. Their prescription for improvement was the specification of clear coding processes and a focus on heightening the awareness of all staff engaged in documentation with respect to data quality.\n<\/p><p>Devoe, <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-DevoeElectronic11_11-0\" class=\"reference\"><a href=\"#cite_note-DevoeElectronic11-11\" rel=\"external_link\">[7]<\/a><\/sup> compared the entries in EHRs with the same data in the Medicaid claims data set for a group of 50 community health centers in Oregon. They found gaps in data congruence across the study group, with some services documented in the Medicaid data set but not the EHRs, and others documented in the EHRs but not in the Medicaid data set. For the latter group, nearly 50 percent of services documented in the EHR were not found in the Medicaid claims for HbA1c, cholesterol screening, retinopathy screening and influenza vaccination. They also evaluated demographic characteristics and found that Spanish speaking patients, as well as those who had gaps in insurance coverage, were more likely to have services documented in the EHR but not in the Medicaid claims data, a finding especially relevant to community health centers, which disproportionately serve poor, uninsured individuals and those best served in languages other than English.<sup id=\"rdp-ebb-cite_ref-12\" class=\"reference\"><a href=\"#cite_note-12\" rel=\"external_link\">[e]<\/a><\/sup>\n<\/p><p>Outside of health care, other industries \u2014 including discrete manufacturing and financial services \u2014 have struggled with the overall issue of data quality.<sup id=\"rdp-ebb-cite_ref-O.27ConnorData07_13-0\" class=\"reference\"><a href=\"#cite_note-O.27ConnorData07-13\" rel=\"external_link\">[8]<\/a><\/sup> Over time, both of these industries have, for the most part, achieved very high levels of data quality and high levels of user confidence in their data, and the experience in these industries might provide some insight into data quality improvement in healthcare.<sup id=\"rdp-ebb-cite_ref-14\" class=\"reference\"><a href=\"#cite_note-14\" rel=\"external_link\">[f]<\/a><\/sup> Two projects are especially instructive in this area. The C4 Project at General Motors, which began in 1986, was an attempt to develop an entirely paperless design and manufacturing specification system for automotive manufacturing.<sup id=\"rdp-ebb-cite_ref-BlissTheC496_15-0\" class=\"reference\"><a href=\"#cite_note-BlissTheC496-15\" rel=\"external_link\">[9]<\/a><\/sup> The data quality effort associated with this project was immense. A staff of close to 50 people was assigned to the various parts of data acquisition, normalization, maintenance and life cycle management. The project emphasized the design of processes to ensure data quality and integrity. In particular, data governance was monitored at least as much as data entry, storage and usage. This went a long way toward ensuring a high level of data quality.\n<\/p><p>The same was true of a project at Goldman Sachs to develop an integrated trading system. The primary effort was the development of a set of data governance and data life cycle processes that focused on the awareness of data quality.<sup id=\"rdp-ebb-cite_ref-16\" class=\"reference\"><a href=\"#cite_note-16\" rel=\"external_link\">[g]<\/a><\/sup>\n<\/p><p>The studies in other industries point to the processes deployed and potential for change relevant to the health care industry, while the studies of health services data frame some of the known challenges and complexities that remain in the health care industry. For many community health centers, which serve especially vulnerable populations, and tend to be less well-resourced than <a href=\"https:\/\/www.limswiki.org\/index.php\/Hospital\" title=\"Hospital\" target=\"_blank\" class=\"wiki-link\" data-key=\"b8f070c66d8123fe91063594befebdff\">hospital<\/a> providers, understanding the complexities related to data collection and use, and developing appropriate strategies to improve data collection and use information more effectively, remain challenging.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Methods\">Methods<\/span><\/h2>\n<p>Two urban and one rural multi-site community health center, operating in three different states, were recruited along with one state primary care association to participate in the study in 2014\u20132015. The three health centers varied in size and in the aggregate served approximately 124,000 patients. The participation of the primary care association, which administers a data warehouse for member health center organizations, resulted in a total data set representing 50 CHCs, operating more than 400 practice sites and serving approximately 1.3 million patients. The overall distribution of urban and rural sites was approximately equal. Each organization made available data for a period of either two years (2012-2014) or three years (2011-2014). It should be noted that the deployment was undertaken specific to each site; that is, each location was treated as a unique project site or case.\n<\/p><p>At each project site, a dual-path deployment and analytic process was used, wherein researchers worked with the IT group local to the organization to install and integrate a new software application to provide analytic capacity, alongside the centers\u2019 main systems.<sup id=\"rdp-ebb-cite_ref-20\" class=\"reference\"><a href=\"#cite_note-20\" rel=\"external_link\">[h]<\/a><\/sup> The purpose of this was to help each health enter assess the reliability of its existing systems in deriving results consistent with the new application. This software, an open-source (OSS) Hadoop-based analytic stack, consisted of the Cloudera Express Hadoop distribution that includes the Hadoop Distributed File System (HDFS), Yarn (MapReduce2), Hbase (non-relational data management) and Impala (SQL-based query). Hadoop was selected as the analytic system for several reasons: it is a well-defined and well-understood technology; it is in current use in many sectors; it is relatively easy to install and test; it provides the opportunity to manage and analyze data from very heterogeneous sources; and it is easy to use because analytic queries may be produced in SQL. Most significant is the use of Hadoop for managing Big Data, and for predictive analytics. These are important considerations for health centers as they expand in capacity and require increasingly complex tools for clinical and operational management. The local deployments provided an opportunity to test their use in the health center environment.\n<\/p><p>The OSS installation took between one week and three months, with the duration depending largely on the familiarity of the organization\u2019s IT staff with deploying <a href=\"https:\/\/www.limswiki.org\/index.php\/Category:Open-source_software\" title=\"Category:Open-source software\" target=\"_blank\" class=\"wiki-link\" data-key=\"1aa6da5776c348224e3820317e698a07\">open-source software<\/a>. After deployment of this software, a read-only connection was made either to the database underlying the organization\u2019s electronic health record (EHR) system, or to a data extract or warehouse maintained by the primary care association and one of the urban community health centers. \n<\/p><p>The dual-path process provided for the comparison and separate evaluation of results and data quality for two different data collection and analysis approaches. The first method accessed and analyzed data through the normal processes in use at the health center. These processes included direct access to the EHR or EHR-based data warehouse and analysis through either the EHR\u2019s query facility or the business intelligence (BI) tool in use at the health center. The second method consisted of extracting data from the EHR (or warehouse), normalizing the data according to standard (UDS) definitions and conducting analysis using the Hadoop-based stack. This method allowed for greater transparency and permitted data quality issues such as differences in definitions or ambiguity due to EHR complexity to be identified and addressed.\n<\/p><p>After data were imported into the analytic stack and deployment was completed, an initial \u201clevelup\u201d exercise was performed. This exercise served both to test the analytic system and to facilitate the normalization of all core terms and data definitions between the CHC\u2019s operational systems and the analytic stack. The \u201clevel-up\u201d exercise consisted of a number of defined queries performed through the organization\u2019s regular systems (EHR, SQL, BI tools) and compared with the same queries performed on the analytic stack with the data in the HDFS\/HBase information store. The following queries were performed<sup id=\"rdp-ebb-cite_ref-21\" class=\"reference\"><a href=\"#cite_note-21\" rel=\"external_link\">[i]<\/a><\/sup>:\n<\/p>\n<ul><li> number of patients served, per year<\/li>\n<li> number of patients served presenting with specific diagnoses including hypertension, diabetes, obesity, heart disease, and behavioral health conditions<\/li>\n<li> rank order of prevalent<sup id=\"rdp-ebb-cite_ref-22\" class=\"reference\"><a href=\"#cite_note-22\" rel=\"external_link\">[j]<\/a><\/sup> comorbidities<\/li>\n<li> cost<sup id=\"rdp-ebb-cite_ref-23\" class=\"reference\"><a href=\"#cite_note-23\" rel=\"external_link\">[k]<\/a><\/sup> per patient, per year<\/li>\n<li> cost, per comorbidity, per year<\/li><\/ul>\n<p>This exercise, undertaken by the organizations in concert with the researchers, took between two weeks and six months to perform, and was largely dependent upon the organization\u2019s prior work on data normalization.\n<\/p><p>In parallel with this technical deployment, training was provided to each organization\u2019s IT personnel and other staff, including in all cases the CEO or executive director. The training\nfocused on the uses of the stack-driven analytics, an exploration of its advantages and disadvantages, and addressed how it differed from extant business intelligence and other reporting.\n<\/p><p>Lastly, informal interviews were conducted with the chief medical officer or CEO at each organization to review results, discuss important findings, and consider potential challenges and\napproaches to continued analysis.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Study_limitations\">Study limitations<\/span><\/h2>\n<p>The analysis was confined to community health center organizations and included organizations operating in just three states. It may not be representative of a broader group of health centers nationally or of CHCs in different states. The data focused on specific years and included centers that had undergone an EHR migration prior to the analysis period, which is an experience that may not be shared by health centers generally. Despite these limitations, the data quality issues identified among the participants provide evidence of common data concerns and challenges.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Results\">Results<\/span><\/h2>\n<p>The exercise revealed common data quality issues across each of the organizations. These include missing or unusable data, as well as differences between the definition of core terms such as \"patient\" or \"encounter,\" both within and across organizations even though these terms have standard definitions as required by HRSA. Finally, under-reporting of certain diagnoses in comparison to the general population raises questions about the reliability of the data. While interesting in and of themselves, these initial results are important for what they may indicate about the bigger picture of data acquisition and use. \n<\/p><p>Table 1 below illustrates the values reported by the participating organizations, presented as ranges, for key diagnoses. As noted above, these results represent data from approximately 50 CHCs comprising over 400 clinical sites and a total of 1.3 million patients, for a period of two to three years. Reported population percentages (CDC FastStats) for the U.S. population as a whole are presented for comparison.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"8\"><b>Table 1.<\/b> Population percent range values for selected diagnoses; * indicates outlying data.\n<\/td><\/tr>\n<tr>\n<th style=\"padding-left:10px; padding-right:10px;\">Diagnosis\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">Range from EHR Values\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">Range from Analytic Values\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">U.S. Population Percentage (CDC)<sup id=\"rdp-ebb-cite_ref-24\" class=\"reference\"><a href=\"#cite_note-24\" rel=\"external_link\">[l]<\/a><\/sup>\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Hypertension\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">17%-23%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">4%*-22%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">33.5%\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Diabetes\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">6%-8%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">2%*-8%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">9.6%\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Obesity\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">3%*-12%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">3%*-12%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">37.9%\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Heart disease\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1%-4%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1%-3%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">11.5%\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Several issues are evident in the results reported above. The first is that the hypertension, diabetes and obesity results include a number of outliers (marked *). In each case, the outlier data are attributable to one (out of >40) health center organization. If these data are discarded, the comparative ranges for the results derived from EHR data are similar to the results derived from the data imported into the analytic stack. The table below shows the effect of removing the outlier organization from the analytic stack results. With the outlier removed, the EHR and stack-driven results are closer, but there is still some variation illustrated by the data ranges, reflecting inconsistencies in the data.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"8\"><b>Table 2.<\/b> Adjusted population percent range values for selected diagnoses.\n<\/td><\/tr>\n<tr>\n<th style=\"padding-left:10px; padding-right:10px;\">Diagnosis\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">Range from EHR Values\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">Range from Analytic Values\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">U.S. Population Percentage (CDC)\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Hypertension\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">17%-23%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">17%-22%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">33.5%\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Diabetes\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">6%-8%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">5%-8%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">9.6%\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Obesity\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">9%-10%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">7%-12%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">37.9%\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Heart disease\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1%-4%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1%-3%\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">11.5%\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>In addition, the percentage of the population with obesity and heart disease diagnoses, in both the full data table and second, adjusted, table are notably low in comparison with the CDC\u2019s reported figures for the U.S. population as a whole. We might expect the population percentages for these diagnoses in the community health center patient population to be at least the same as, if not higher than, those in the general population, given the documented level of disparities and characteristics of the population served. Possible reasons for these discrepancies will be discussed in the conclusions section of this paper. \n<\/p><p>More generally, the analysis revealed several types of potential data quality issues. Although the nature and extent of the problems varied across sites, the problems \u2014 including definition conflicts, conversion issues and structural challenges \u2014 were not unique to any site and to some extent were evident at all sites. These include:\n<\/p>\n<ul><li> errors resulting from deviation from standard definitions (for example, for patient, encounter, provider) even when guidelines for such definitions exist and are required for standard reporting (UDS, in the case of CHCs);<\/li>\n<li> errors caused by omission: that is, data simply not recorded;<\/li>\n<li> errors resulting from incorrect entry, including:<\/li><\/ul>\n<dl><dd>\u25aa values that are out-of-range and not caught by the EHR system, e.g., BMIs of >1000, BP values of 320\/250, HbA1c values of >50;<\/dd>\n<dd>\u25aa incorrect text entered for names, addresses, previous providers;<\/dd>\n<dd>\u25aa values not entered into searchable fields;<\/dd>\n<dd>\u25aa data recorded as text in clinical notes but not into searchable fields;<\/dd>\n<dd>\u25aa data imported from external sources (labs, registries, etc.) as text but not into searchable fields;<\/dd><\/dl>\n<ul><li> errors resulting from the structure and complexity of EHR systems, including the following problems:<\/li><\/ul>\n<dl><dd>\u25aa Several systems were found to be sensitive to the form the data was entered, specifically ICD-9 codes of 250., 250.0, and 250.00, which resulted in different query results, as did 250.5, 250.50 etc.<\/dd>\n<dd>\u25aa Complexity of navigation and misalignment with provider workflows also appeared to be responsible for several types of errors.<\/dd>\n<dd>\u25aa Concentration on treatment of a single condition during an encounter led to low numbers of encounters with multiple diagnoses recorded; and<\/dd><\/dl>\n<ul><li> data corruption and\/or loss of data resulting from migration to a new EHR platform.<\/li><\/ul>\n<h2><span class=\"mw-headline\" id=\"Conclusions_and_recommendations\">Conclusions and recommendations<\/span><\/h2>\n<p>The length of time required to successfully complete the \u201clevel-up\u201d exercise was substantially shorter in those organizations that had done extensive data normalization work prior to beginning the study. The organizations (PCA and one large, urban CHC) that took the least time (less than one month) to deploy the analytic stack and perform the data quality exercise had previously undertaken substantial work to standardize definitions (semantic normalization) and to do format matching and format transformations (syntactic normalizations). This effort was not related to the study and was in all cases done in conjunction with the creation and population of a data warehouse. In addition, these organizations had already begun an exploration of analytics that enabled them to quickly align with the deployment requirements and to think in terms of strategic analysis. Conversely, the organization requiring the longest period of time to complete the level-up exercise had the most widespread use of idiosyncratic, non-standard (i.e., non-UDS) definitions for core terms such as patient, encounter, and provider, as well as definition mismatches between different clinical departments or between clinical and administrative departments within the organization. \n<\/p><p>The potential under-reporting of key diagnoses, as evidenced in the data for hypertension, diabetes, obesity and heart disease falling well below nationally reported figures, is of a different nature. The patient populations of community health centers are not generally thought to be healthier than the general U.S. population. Health centers' patients are disproportionately poor, uninsured, and publicly insured, and they are disproportionately members of minority groups.<sup id=\"rdp-ebb-cite_ref-25\" class=\"reference\"><a href=\"#cite_note-25\" rel=\"external_link\">[m]<\/a><\/sup> In addition, health centers are more likely to treat patients with chronic illnesses compared to other primary care physicians.<sup id=\"rdp-ebb-cite_ref-26\" class=\"reference\"><a href=\"#cite_note-26\" rel=\"external_link\">[n]<\/a><\/sup> Yet in all cases, the reported percentages for key diagnoses were below the values reported for the population as a whole, and they are especially conspicuous for obesity and heart disease.\n<\/p><p>Al Kazzi <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-AlKazziDiff15_9-1\" class=\"reference\"><a href=\"#cite_note-AlKazziDiff15-9\" rel=\"external_link\">[5]<\/a><\/sup> recently compared hospital discharge data reported in the U.S. Inpatient Reporting Sample (NIS, AHRQ) to interview data reported in the Behavioral Risk Factor Surveillance System (BRFSS, CDC) for 2011 data. Results for obesity showed a 9.6 population percentage in the NIS and 27.4 population percentage in the BRFSS. The population percentages reported in the BRFSS figures, which are based on direct participant surveys, are thus almost three times greater than the results from hospital discharge records<sup id=\"rdp-ebb-cite_ref-27\" class=\"reference\"><a href=\"#cite_note-27\" rel=\"external_link\">[o]<\/a><\/sup>, and more aligned with other recent results.<sup id=\"rdp-ebb-cite_ref-28\" class=\"reference\"><a href=\"#cite_note-28\" rel=\"external_link\">[p]<\/a><\/sup> This suggests that the CHC-reported data are consistent with other provider-reported data, as demonstrated by the NIS results, but understated relative to other sources.\n<\/p><p>To better understand the anomaly with respect to obesity in the health center data sets, these results were reviewed with the chief medical officers and other clinical staff at participating CHCs. Those interviewed estimated the obesity rate for the patients they served at 40 percent. A recent paper in the <i>Journal of the American Medical Association, Internal Medicine<\/i><sup id=\"rdp-ebb-cite_ref-YangPrev15_8-1\" class=\"reference\"><a href=\"#cite_note-YangPrev15-8\" rel=\"external_link\">[4]<\/a><\/sup> estimated that in the United States, 40 percent of adult men and 30 percent of adult women are overweight, while 35 percent of men and 37 percent of women are obese. The estimate provided by the participant CMOs is thus consistent with this data and substantially higher than the data derived from the analysis. \n<\/p><p>CMOs interviewed cited two possible explanations for this. First, it was noted that providers did not often diagnose obesity, and when they did, they did not use the full range of ICD-9 codes, which include three specific codes (278, unspecified obesity; 278.01, morbid obesity, BMI >30; and 278.02, overweight, BMI >25). Further, while the UDS guidelines specify the use of the 22 V-codes for obesity, with a highly specific breakdown of BMI measurements, these apparently are also underutilized. It was conjectured that the data might reflect sensitivity to different cultural norms for defining obesity and being overweight in the communities served.<sup id=\"rdp-ebb-cite_ref-29\" class=\"reference\"><a href=\"#cite_note-29\" rel=\"external_link\">[q]<\/a><\/sup> While more investigation needs to be done to understand the data anomaly, the range of three to 12 percent reported by the health center organizations in this study seems unlikely and could reflect both reporting and recording bias, as well as data quality issues.\n<\/p><p>Obesity might be subjective (although BMI values are a typically-used standard), but heart disease is a specific diagnosable occurrence. The apparent under-reporting of heart disease in the study group \u2014 approximately seven to eight percent, as compared to 11 percent nationally per the CDC \u2014 is therefore harder to explain. Most CMOs thought that 20 to 30 percent of their patients experienced some form of heart disease. Possible causes of under-reporting are still under investigation, although it should be said that our analysis was not age-adjusted. It was also not adjusted for the fact that, particularly in 2014, many patients not previously known to the health centers were seen for the first time as coverage expanded, and the addition of new patients may affect the distribution of diagnoses in ways that we do not yet understand.\n<\/p><p>Comparing the body of data quality work in aerospace and financial services industries with that in healthcare can be instructive. Each of the GM and Goldman Sachs projects referenced had several similarities besides emphasizing data governance. These included: 1) high level executive sponsorship \u2013 an EVP and\/or CEO who actually participated in introducing and reviewing the projects; 2) a long period of pre-work during which core terms were defined, data was normalized, and workflows and work processes were redesigned or newly created in order to provide an environment that promoted data quality; 3) broad participation from across the organization, not just IT; and 4) emphasis on standards where necessary or productive, but not as the primary or sole focus of effort. The most important characteristics in these industries\u2019 efforts were term definition, data normalization and process redesign, as well as broad participation in the entire effort across the phases of planning, initiation, deployment and ongoing improvement. These are industries that spent decades re-engineering their workflows and processes for operational and informational efficiency and effectiveness.<sup id=\"rdp-ebb-cite_ref-HammerReeng90_30-0\" class=\"reference\"><a href=\"#cite_note-HammerReeng90-30\" rel=\"external_link\">[13]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-HammerReeng01_31-0\" class=\"reference\"><a href=\"#cite_note-HammerReeng01-31\" rel=\"external_link\">[14]<\/a><\/sup>\n<\/p><p>In contrast, our experience suggests that: 1) governance and information life cycle are not at the core of how the healthcare industry approaches such projects; 2) while executive sponsorship is the norm, executive participation is rare; 3) many projects are designed, led and carried out by the IT group; and 4) standards are seen as a major part of the solution, including by federal agencies and regulators (i.e., ONC, CMS, HRSA). These issues remain to be tackled in healthcare organizations.\n<\/p><p>This leaves us with the larger questions that were mentioned earlier. Our current findings provide some indications of the influences on data quality that might explain, at least in part, the variations and unexpected results. These influences include: quality degradation from system migration; inadequate or inappropriate data entry causing missing or incorrect data; inaccessible data in text entries from provider notes, text lists and external text imports; inadequate definition and format normalization resulting in unusable data; systemic errors due to current practice norms; idiosyncrasies in how different EHRs process diagnosis codes; and complexity of navigation in EHRs. Many of these issues can at least be addressed by greater attention to detail at the data entry stage. Improving data quality directly in the EHR is more effective than trying to address it after the data are entered. The achievement of the Quadruple Aim \u2014 which encompasses improving the work life of health care providers, clinicians and staff as well as enhancing patient experience, improving population health, and reducing costs \u2014 clearly necessitates early preparation and consistent attention to data quality.\n<\/p><p>Several recommendations are suggested by these results:\n<\/p>\n<ul><li> Definitions of core terms should be reviewed and consensus reached on their application and use. Moreover, data definitions and workflows should be aligned with standard practices.<\/li>\n<li> Workflows and other processes should be reviewed and redesigned as necessary to emphasize and promote data quality.<\/li>\n<li> Organizations should familiarize themselves with how their EHR processes data as it is entered (for example, how diagnosis codes are treated), and ensure that entered data is treated consistently by the EHR.<\/li>\n<li> Text data should be entered in a consistent manner that is retrievable for analysis as well as for use in diagnosis and patient care.<\/li>\n<li> Before migrating from one EHR platform to another, data should be cleaned and checked. Extensive data checking should also be done after EHR system migration. Care must be taken that the data from the retired system is backed up, potentially in a data extract, so that is available if any conversion loss occurs, and to vet the integrity of the migration process.<\/li><\/ul>\n<p>These recommendations are aimed at helping the health centers to answer two strategic questions. First, how good are your data? Clinical data from the EHR, data imported from labs and other providers, and financial data need to be carefully reviewed and vetted for accuracy, reliability and completeness. Second, how good are your systems? This includes infrastructure (servers, storage, network) and software systems and applications as well as processes and workflows. Finally, it includes staffing and staff training and engagement. Supporting health centers with the expertise and funds to undertake this work should be prioritized.\n<\/p><p>To date, the participating organizations have not moved substantially beyond the initial data quality exercise. The issue remains of how to integrate analytic results into the strategic and operational practice of a community health center, or a healthcare organization in general. The experience from this study indicates that while community health centers generally attain the highest standards of care and achieve good outcomes with respect to both quality and cost, considerable work may be needed to help all centers strengthen their awareness of data and information quality and move toward better integration of analytic results in practice. This awareness includes understanding how to perform complex analytic queries, and what applications might best be suited to particular types of analysis, but these are just two components of a forward\u2013thinking data strategy. All staff at healthcare organizations, but particularly those engaged in using data for decision-making, must develop an awareness and appreciation of what data are available, how it can best be analyzed and how these results relate to the clinical, operational and strategic needs of the organization.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h2>\n<p>The authors would like to acknowledge the assistance of Srini Rao, Ph.D., CEO of Datycs for deployment and analytic support.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conflicts_of_interest\">Conflicts of interest<\/span><\/h2>\n<p>The authors have no conflicts of interest to report.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Footnotes\">Footnotes<\/span><\/h2>\n<div class=\"reflist\" style=\"list-style-type: lower-alpha;\">\n<ol class=\"references\">\n<li id=\"cite_note-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-1\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">As defined in Health Resources and Services Administration's <i><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/bphc.hrsa.gov\/datareporting\/reporting\/2014udsmanual.pdf\" target=\"_blank\">Bureau of Primary Health Care, UDS Reporting Instructions for Health Centers<\/a><\/i>, 2014 Edition (PDF)<\/span>\n<\/li>\n<li id=\"cite_note-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-2\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">\"<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/whatis.techtarget.com\/definition\/data-hygiene\" target=\"_blank\">Data hygiene<\/a> is the collective processes conducted to ensure the cleanliness of data. Data is considered clean if it is relatively error-free.\"<\/span>\n<\/li>\n<li id=\"cite_note-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-3\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"url\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/bphc.hrsa.gov\/uds\/datacenter.aspx?q=tall&year=2015&state=\" target=\"_blank\">See HRSA's 2015 Health Center Data, Table 5 - Staffing and Utilization<\/a><\/span><\/span>\n<\/li>\n<li id=\"cite_note-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-7\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><i>c.f.<\/i> Nambiar, et al. 2013<sup id=\"rdp-ebb-cite_ref-NambiarALook13_4-0\" class=\"reference\"><a href=\"#cite_note-NambiarALook13-4\" rel=\"external_link\">[1]<\/a><\/sup>; Raghupathi, et al. 2013<sup id=\"rdp-ebb-cite_ref-RaghupathiAnOver13_5-0\" class=\"reference\"><a href=\"#cite_note-RaghupathiAnOver13-5\" rel=\"external_link\">[2]<\/a><\/sup> and Ward, et al. 2014<sup id=\"rdp-ebb-cite_ref-WardApp14_6-0\" class=\"reference\"><a href=\"#cite_note-WardApp14-6\" rel=\"external_link\">[3]<\/a><\/sup><\/span>\n<\/li>\n<li id=\"cite_note-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-12\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">National Association of Community Health Centers' <i><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/nachc.org\/wp-content\/uploads\/2016\/08\/Chartbook16.pdf\" target=\"_blank\">A Sketch of Community Health Centers<\/a><\/i>, August 2016 (PDF)<\/span>\n<\/li>\n<li id=\"cite_note-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-14\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">Study author Dr. Hartzband has had extensive experience in these industries, including external architect for the General Motors C4 project \u2014 an effort to develop a paperless design process for car manufacturing \u2014 and as a principal consultant to Ernst & Young for the Goldman Sachs integrated trading system effort.<\/span>\n<\/li>\n<li id=\"cite_note-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-16\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">Via personal correspondence between Dr. Hartzband and Goldman Sachs team.<\/span>\n<\/li>\n<li id=\"cite_note-20\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-20\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">Designed in accordance with various reviews of healthcare data quality assessment, especially:\nKahn, et al. 2012<sup id=\"rdp-ebb-cite_ref-KahnAPrag12_17-0\" class=\"reference\"><a href=\"#cite_note-KahnAPrag12-17\" rel=\"external_link\">[10]<\/a><\/sup>; Weiskopf and Weng, 2013<sup id=\"rdp-ebb-cite_ref-WeiskopfMethods13_18-0\" class=\"reference\"><a href=\"#cite_note-WeiskopfMethods13-18\" rel=\"external_link\">[11]<\/a><\/sup> and Cai and Zhu, 2015<sup id=\"rdp-ebb-cite_ref-CaiTheChall15_19-0\" class=\"reference\"><a href=\"#cite_note-CaiTheChall15-19\" rel=\"external_link\">[12]<\/a><\/sup><\/span>\n<\/li>\n<li id=\"cite_note-21\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-21\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.bphcdata.net\/docs\/uds_rep_instr.pdf\" target=\"_blank\">Uniform Data System<\/a> definitions are used for all terms, including visits, patients and conditions (hyperlink field removed).<\/span>\n<\/li>\n<li id=\"cite_note-22\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-22\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">Data quality and access issues prevented the accurate calculation of comorbidities.<\/span>\n<\/li>\n<li id=\"cite_note-23\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-23\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">Actual cost (expenditure), not billed cost (revenue). It is important to note that actual cost was not able to be calculated at any of the health centers, and so these queries were not run.<\/span>\n<\/li>\n<li id=\"cite_note-24\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-24\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">Figures from updated National Center for Health Statistics, CDC Fast Stats (2013-2014) for adults over 40 years of age. CDC definitions for diagnosis are identical to those used by HRSA for UDS except for heart disease, where the UDS definitions encompass more codes and therefore more conditions.<\/span>\n<\/li>\n<li id=\"cite_note-25\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-25\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"url\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/bphc.hrsa.gov\/uds\/datacenter.aspx?q=tall&year=2015&state=\" target=\"_blank\">See HRSA's 2015 Health Center Data, Table 4 - Selected Patient Characteristics<\/a><\/span><\/span>\n<\/li>\n<li id=\"cite_note-26\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-26\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">See NACHC's <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/old.nachc.com\/client\/\/Chartbook_December_2014.pdf\" target=\"_blank\">2014 Chart Book<\/a>, Figure 1.9 (PDF)<\/span>\n<\/li>\n<li id=\"cite_note-27\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-27\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">NIS: Discharge level data from approximately 8 million hospital stays (2011); BRFSS: 506,467 adult participants (2011)<\/span>\n<\/li>\n<li id=\"cite_note-28\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-28\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">NIS: Discharge level data from approximately 8 million hospital stays (2011); BRFSS: 506,467 adult participants (2011)<\/span>\n<\/li>\n<li id=\"cite_note-29\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-29\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">Several chief medical officers suggested that cultural norms for what is considered obesity vary greatly among communities, and that providers might be unwilling to make a diagnosis not in line with such norms.<\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-NambiarALook13-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NambiarALook13_4-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Nambiar, R.; Bhardwaj, R.; Sethi, A. et al. (2013). \"A look at challenges and opportunities of Big Data analytics in healthcare\". <i>2013 IEEE International Conference on Big Data<\/i> <b>2013<\/b>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1109%2FBigData.2013.6691753\" target=\"_blank\">10.1109\/BigData.2013.6691753<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+look+at+challenges+and+opportunities+of+Big+Data+analytics+in+healthcare&rft.jtitle=2013+IEEE+International+Conference+on+Big+Data&rft.aulast=Nambiar%2C+R.%3B+Bhardwaj%2C+R.%3B+Sethi%2C+A.+et+al.&rft.au=Nambiar%2C+R.%3B+Bhardwaj%2C+R.%3B+Sethi%2C+A.+et+al.&rft.date=2013&rft.volume=2013&rft_id=info:doi\/10.1109%2FBigData.2013.6691753&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RaghupathiAnOver13-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RaghupathiAnOver13_5-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Raghupathi, W.; Raghupathi, V. (2013). \"An overview of health analytics\". <i>Journal of Health & Medical Informatics<\/i> <b>4<\/b>: 132. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.4172%2F2157-7420.1000132\" target=\"_blank\">10.4172\/2157-7420.1000132<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+overview+of+health+analytics&rft.jtitle=Journal+of+Health+%26+Medical+Informatics&rft.aulast=Raghupathi%2C+W.%3B+Raghupathi%2C+V.&rft.au=Raghupathi%2C+W.%3B+Raghupathi%2C+V.&rft.date=2013&rft.volume=4&rft.pages=132&rft_id=info:doi\/10.4172%2F2157-7420.1000132&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WardApp14-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WardApp14_6-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Ward, M.J.; Karsolo, K.A.; Froehle, C.M. (2014). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4242091\" target=\"_blank\">\"Applications of business analytics in healthcare\"<\/a>. <i>Business Horizons<\/i> <b>57<\/b> (5): 571\u2013582. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.bushor.2014.06.003\" target=\"_blank\">10.1016\/j.bushor.2014.06.003<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4242091\/\" target=\"_blank\">PMC4242091<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/25429161\" target=\"_blank\">25429161<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4242091\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4242091<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Applications+of+business+analytics+in+healthcare&rft.jtitle=Business+Horizons&rft.aulast=Ward%2C+M.J.%3B+Karsolo%2C+K.A.%3B+Froehle%2C+C.M.&rft.au=Ward%2C+M.J.%3B+Karsolo%2C+K.A.%3B+Froehle%2C+C.M.&rft.date=2014&rft.volume=57&rft.issue=5&rft.pages=571%E2%80%93582&rft_id=info:doi\/10.1016%2Fj.bushor.2014.06.003&rft_id=info:pmc\/PMC4242091&rft_id=info:pmid\/25429161&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4242091&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-YangPrev15-8\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-YangPrev15_8-0\" rel=\"external_link\">4.0<\/a><\/sup> <sup><a href=\"#cite_ref-YangPrev15_8-1\" rel=\"external_link\">4.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Yang, L.; Colditz, G.A. (2015). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4625533\" target=\"_blank\">\"Prevalence of overweight and obesity in the United States, 2007-2012\"<\/a>. <i>JAMA Internal Medicine<\/i> <b>175<\/b> (8): 1412\u20133. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1001%2Fjamainternmed.2015.2405\" target=\"_blank\">10.1001\/jamainternmed.2015.2405<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4625533\/\" target=\"_blank\">PMC4625533<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26098405\" target=\"_blank\">26098405<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4625533\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4625533<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Prevalence+of+overweight+and+obesity+in+the+United+States%2C+2007-2012&rft.jtitle=JAMA+Internal+Medicine&rft.aulast=Yang%2C+L.%3B+Colditz%2C+G.A.&rft.au=Yang%2C+L.%3B+Colditz%2C+G.A.&rft.date=2015&rft.volume=175&rft.issue=8&rft.pages=1412%E2%80%933&rft_id=info:doi\/10.1001%2Fjamainternmed.2015.2405&rft_id=info:pmc\/PMC4625533&rft_id=info:pmid\/26098405&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4625533&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AlKazziDiff15-9\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-AlKazziDiff15_9-0\" rel=\"external_link\">5.0<\/a><\/sup> <sup><a href=\"#cite_ref-AlKazziDiff15_9-1\" rel=\"external_link\">5.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Al Kazzi, E.S.; Lau, B.; Li, T. et al. (2015). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4633065\" target=\"_blank\">\"Differences in the prevalence of obesity, smoking and alcohol in the United States Nationwide Inpatient Sample and the Behavioral Risk Factor Surveillance System\"<\/a>. <i>PLoS One<\/i> <b>10<\/b> (11): e0140165. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pone.0140165\" target=\"_blank\">10.1371\/journal.pone.0140165<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4633065\/\" target=\"_blank\">PMC4633065<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26536469\" target=\"_blank\">26536469<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4633065\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4633065<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Differences+in+the+prevalence+of+obesity%2C+smoking+and+alcohol+in+the+United+States+Nationwide+Inpatient+Sample+and+the+Behavioral+Risk+Factor+Surveillance+System&rft.jtitle=PLoS+One&rft.aulast=Al+Kazzi%2C+E.S.%3B+Lau%2C+B.%3B+Li%2C+T.+et+al.&rft.au=Al+Kazzi%2C+E.S.%3B+Lau%2C+B.%3B+Li%2C+T.+et+al.&rft.date=2015&rft.volume=10&rft.issue=11&rft.pages=e0140165&rft_id=info:doi\/10.1371%2Fjournal.pone.0140165&rft_id=info:pmc\/PMC4633065&rft_id=info:pmid\/26536469&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4633065&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-O.27MalleyMeasuring05-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-O.27MalleyMeasuring05_10-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">O'Malley, K.J.; Cook, K.F.; Price, M.D. et al. (2005). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1361216\" target=\"_blank\">\"Measuring diagnoses: ICD code accuracy\"<\/a>. <i>Health Services Research<\/i> <b>40<\/b> (5 Pt. 2): 1620\u201339. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1111%2Fj.1475-6773.2005.00444.x\" target=\"_blank\">10.1111\/j.1475-6773.2005.00444.x<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC1361216\/\" target=\"_blank\">PMC1361216<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/16178999\" target=\"_blank\">16178999<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1361216\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1361216<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Measuring+diagnoses%3A+ICD+code+accuracy&rft.jtitle=Health+Services+Research&rft.aulast=O%27Malley%2C+K.J.%3B+Cook%2C+K.F.%3B+Price%2C+M.D.+et+al.&rft.au=O%27Malley%2C+K.J.%3B+Cook%2C+K.F.%3B+Price%2C+M.D.+et+al.&rft.date=2005&rft.volume=40&rft.issue=5+Pt.+2&rft.pages=1620%E2%80%9339&rft_id=info:doi\/10.1111%2Fj.1475-6773.2005.00444.x&rft_id=info:pmc\/PMC1361216&rft_id=info:pmid\/16178999&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC1361216&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DevoeElectronic11-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DevoeElectronic11_11-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Devoe, J.E.; Gold, R.; McIntire, P. et al. (2011). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3133583\" target=\"_blank\">\"Electronic health records vs Medicaid claims: Completeness of diabetes preventive care data in community health centers\"<\/a>. <i>Annals of Family Medicine<\/i> <b>9<\/b> (4): 351\u20148. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1370%2Fafm.1279\" target=\"_blank\">10.1370\/afm.1279<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3133583\/\" target=\"_blank\">PMC3133583<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21747107\" target=\"_blank\">21747107<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3133583\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3133583<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Electronic+health+records+vs+Medicaid+claims%3A+Completeness+of+diabetes+preventive+care+data+in+community+health+centers&rft.jtitle=Annals+of+Family+Medicine&rft.aulast=Devoe%2C+J.E.%3B+Gold%2C+R.%3B+McIntire%2C+P.+et+al.&rft.au=Devoe%2C+J.E.%3B+Gold%2C+R.%3B+McIntire%2C+P.+et+al.&rft.date=2011&rft.volume=9&rft.issue=4&rft.pages=351%E2%80%948&rft_id=info:doi\/10.1370%2Fafm.1279&rft_id=info:pmc\/PMC3133583&rft_id=info:pmid\/21747107&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3133583&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-O.27ConnorData07-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-O.27ConnorData07_13-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">O'Connor, L. (May 2007). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/mitiq.mit.edu\/IQIS\/Documents\/CDOIQS_200777\/Papers\/01_59_4E.pdf\" target=\"_blank\">\"Data Quality Management and Financial Services\"<\/a>. <i>Proceedings of the MIT 2007 Information Quality Industry Symposium<\/i><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/mitiq.mit.edu\/IQIS\/Documents\/CDOIQS_200777\/Papers\/01_59_4E.pdf\" target=\"_blank\">http:\/\/mitiq.mit.edu\/IQIS\/Documents\/CDOIQS_200777\/Papers\/01_59_4E.pdf<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Data+Quality+Management+and+Financial+Services&rft.atitle=Proceedings+of+the+MIT+2007+Information+Quality+Industry+Symposium&rft.aulast=O%27Connor%2C+L.&rft.au=O%27Connor%2C+L.&rft.date=May+2007&rft_id=http%3A%2F%2Fmitiq.mit.edu%2FIQIS%2FDocuments%2FCDOIQS_200777%2FPapers%2F01_59_4E.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BlissTheC496-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BlissTheC496_15-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Bliss, F.W.. \"The C4 Program at General Motors\". In Machover, C.. <i>The CAD\/CAM Handbook<\/i>. McGraw-Hill, Inc. pp. 309\u2013320. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 0070393753.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=The+C4+Program+at+General+Motors&rft.atitle=The+CAD%2FCAM+Handbook&rft.aulast=Bliss%2C+F.W.&rft.au=Bliss%2C+F.W.&rft.pages=pp.%26nbsp%3B309%E2%80%93320&rft.pub=McGraw-Hill%2C+Inc&rft.isbn=0070393753&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KahnAPrag12-17\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KahnAPrag12_17-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Kahn, M.G.; Raebel, M.A.; Glanz, J.M. et al. (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3833692\" target=\"_blank\">\"A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research\"<\/a>. <i>Medical Care<\/i> <b>50<\/b> (Suppl.): S21\u20139. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1097%2FMLR.0b013e318257dd67\" target=\"_blank\">10.1097\/MLR.0b013e318257dd67<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3833692\/\" target=\"_blank\">PMC3833692<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22692254\" target=\"_blank\">22692254<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3833692\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3833692<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+pragmatic+framework+for+single-site+and+multisite+data+quality+assessment+in+electronic+health+record-based+clinical+research&rft.jtitle=Medical+Care&rft.aulast=Kahn%2C+M.G.%3B+Raebel%2C+M.A.%3B+Glanz%2C+J.M.+et+al.&rft.au=Kahn%2C+M.G.%3B+Raebel%2C+M.A.%3B+Glanz%2C+J.M.+et+al.&rft.date=2012&rft.volume=50&rft.issue=Suppl.&rft.pages=S21%E2%80%939&rft_id=info:doi\/10.1097%2FMLR.0b013e318257dd67&rft_id=info:pmc\/PMC3833692&rft_id=info:pmid\/22692254&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3833692&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WeiskopfMethods13-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WeiskopfMethods13_18-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Weiskopf, N.G.; Weng, C. et al. (2013). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3555312\" target=\"_blank\">\"Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research\"<\/a>. <i>JAMIA<\/i> <b>20<\/b> (1): 144-51. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1136%2Famiajnl-2011-000681\" target=\"_blank\">10.1136\/amiajnl-2011-000681<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3555312\/\" target=\"_blank\">PMC3555312<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22733976\" target=\"_blank\">22733976<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3555312\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3555312<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Methods+and+dimensions+of+electronic+health+record+data+quality+assessment%3A+Enabling+reuse+for+clinical+research&rft.jtitle=JAMIA&rft.aulast=Weiskopf%2C+N.G.%3B+Weng%2C+C.+et+al.&rft.au=Weiskopf%2C+N.G.%3B+Weng%2C+C.+et+al.&rft.date=2013&rft.volume=20&rft.issue=1&rft.pages=144-51&rft_id=info:doi\/10.1136%2Famiajnl-2011-000681&rft_id=info:pmc\/PMC3555312&rft_id=info:pmid\/22733976&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3555312&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CaiTheChall15-19\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CaiTheChall15_19-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Cai, L.; Zhu, Y. (2015). \"The challenges of data quality and data quality assessment in the big data era\". <i>Data Science Journal<\/i> <b>14<\/b>: 2. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.5334%2Fdsj-2015-002\" target=\"_blank\">10.5334\/dsj-2015-002<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+challenges+of+data+quality+and+data+quality+assessment+in+the+big+data+era&rft.jtitle=Data+Science+Journal&rft.aulast=Cai%2C+L.%3B+Zhu%2C+Y.&rft.au=Cai%2C+L.%3B+Zhu%2C+Y.&rft.date=2015&rft.volume=14&rft.pages=2&rft_id=info:doi\/10.5334%2Fdsj-2015-002&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HammerReeng90-30\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HammerReeng90_30-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Hammer, M. (July-August 1990). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/hbr.org\/1990\/07\/reengineering-work-dont-automate-obliterate\" target=\"_blank\">\"Reengineering Work: Don\u2019t Automate, Obliterate\"<\/a>. <i>Harvard Business Review<\/i>. pp. 104\u201312<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/hbr.org\/1990\/07\/reengineering-work-dont-automate-obliterate\" target=\"_blank\">https:\/\/hbr.org\/1990\/07\/reengineering-work-dont-automate-obliterate<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Reengineering+Work%3A+Don%E2%80%99t+Automate%2C+Obliterate&rft.atitle=Harvard+Business+Review&rft.aulast=Hammer%2C+M.&rft.au=Hammer%2C+M.&rft.date=July-August+1990&rft.pages=pp.+104%E2%80%9312&rft_id=https%3A%2F%2Fhbr.org%2F1990%2F07%2Freengineering-work-dont-automate-obliterate&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HammerReeng01-31\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HammerReeng01_31-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Hammer, M.; Champy, J.A. (2001). <i>Reengineering the Corporation: A Manifesto for Business Revolution<\/i>. Harper Business Books. pp. 272. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 0066621127.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Reengineering+the+Corporation%3A+A+Manifesto+for+Business+Revolution&rft.aulast=Hammer%2C+M.%3B+Champy%2C+J.A.&rft.au=Hammer%2C+M.%3B+Champy%2C+J.A.&rft.date=2001&rft.pages=pp.%26nbsp%3B272&rft.pub=Harper+Business+Books&rft.isbn=0066621127&rfr_id=info:sid\/en.wikipedia.org:Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. To more easily differentiate footnotes from references, the original footnotes (which where numbered) were updated to use lowercase letters. The citation information for the first reference was incorrect and has been updated. The URL to the NACHC's 2014 Chart Book was broken, and a current URL was substituted.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20181214191105\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.478 seconds\nReal time usage: 0.508 seconds\nPreprocessor visited node count: 13314\/1000000\nPreprocessor generated node count: 36109\/1000000\nPost\u2010expand include size: 106217\/2097152 bytes\nTemplate argument size: 37046\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 483.032 1 - -total\n 73.15% 353.336 2 - Template:Reflist\n 58.56% 282.863 14 - Template:Citation\/core\n 46.41% 224.153 10 - Template:Cite_journal\n 15.68% 75.733 1 - Template:Infobox_journal_article\n 14.93% 72.125 1 - Template:Infobox\n 9.01% 43.502 2 - Template:Cite_book\n 8.90% 42.994 17 - Template:Efn\n 8.80% 42.489 2 - Template:Cite_web\n 8.43% 40.699 80 - Template:Infobox\/row\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:9889-0!*!0!!en!*!* and timestamp 20181214191104 and revision id 28951\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned\">https:\/\/www.limswiki.org\/index.php\/Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","bbfbe3553b26be64d63e45d26612ea45_images":[],"bbfbe3553b26be64d63e45d26612ea45_timestamp":1544814664,"bfe42513d857c82a22a78dbd758fc186_type":"article","bfe42513d857c82a22a78dbd758fc186_title":"Informatics metrics and measures for a smart public health systems approach: Information science perspective (Carney and Shea 2017)","bfe42513d857c82a22a78dbd758fc186_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective","bfe42513d857c82a22a78dbd758fc186_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Informatics metrics and measures for a smart public health systems approach: Information science perspective\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nInformatics metrics and measures for a smart public health systems approach: Information science perspectiveJournal\n \nComputational and Mathematical Methods in MedicineAuthor(s)\n \nCarney, Timothy J.; Shea, Christopher M.Author affiliation(s)\n \nGillings School of Global Public Health at University of North Carolina - Chapel HillEditors\n \nRodr\u00edguez-Gonz\u00e1lez, AlejandroYear published\n \n2017Volume and issue\n \n2017Page(s)\n \n1452415DOI\n \n10.1155\/2017\/1452415ISSN\n \n1748-6718Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttps:\/\/www.hindawi.com\/journals\/cmmm\/2017\/1452415\/Download\n \nhttp:\/\/downloads.hindawi.com\/journals\/cmmm\/2017\/1452415.pdf (PDF)\n\nContents\n\n1 Abstract \n2 Introduction \n3 Factors shaping smart agents and organizations \n\n3.1 Organizational complexity factors shaping public health knowledge environments \n3.2 Problem\/Issue complexity factors shaping knowledge environments \n3.3 Situational awareness factors shaping information environments \n\n\n4 Smart systems vulnerability index \n\n4.1 Knowledge discovery rate (KDR) \n4.2 Organizational (agent- or systems-level) memory \n4.3 Agent-specific and system learning \n4.4 Knowledge absorption rate (KAR) \n4.5 Agent-specific and system-level cognitive demand \n4.6 Cognitive mapping \n4.7 Aberrant detection analytics \n\n\n5 Conclusion \n6 Disclosure \n7 Competing interests \n8 Authors' contributions \n9 Acknowledgments \n10 References \n11 Notes \n\n\n\nAbstract \nPublic health informatics is an evolving domain in which practices constantly change to meet the demands of a highly complex public health and healthcare delivery system. Given the emergence of various concepts, such as learning health systems, smart health systems, and adaptive complex health systems, health informatics professionals would benefit from a common set of measures and capabilities to inform our modeling, measuring, and managing of health system \u201csmartness.\u201d Here, we introduce the concepts of organizational complexity, problem\/issue complexity, and situational awareness as three codependent drivers of smart public health systems characteristics. We also propose seven smart public health systems measures and capabilities that are important in a public health informatics professional\u2019s toolkit.\n\nIntroduction \nPublic health informatics is an evolving domain in which practices constantly change to meet the demands of a highly complex public health and healthcare delivery system. The typical definition for a variety of domains of informatics (e.g., public health, population health, nursing, clinical, medical, health, consumer, and biomedical) centers on the \"application of information science and information technology to [a specific domain of] practice, research, and training.\"[1][2] This definition of informatics relies on a technical view of the health system. A technical view of informatics largely identifies more tangible products such as databases, decision-support tools, information systems, web portals, and mobile devices as the primary means of addressing complex health issues, improving care, and reducing health disparities.\nPublic health informatics systems expressed as a function of intelligence can be understood in terms of two codependent pathways of (1) generating health information technology (HIT) policies that ensure our ability to govern intelligence as a byproduct and (2) allowing innovations in HIT to shape and inform public health systems policy and practice to ensure that we govern intelligently. In the former case, public health informatics professionals endeavor to generate HIT policy to guide national, state, and local information architecture, information infrastructure, and information integration efforts that ultimately guide how public health meets the needs of stakeholder\/agents such as patients\/families\/health consumers, communities, providers\/healthcare organizations, researchers, policymakers, and disease-centric communities of practice through the meaningful supply of intelligence. Such intelligence can inform stakeholder understanding about the burden of disease, spread of an outbreak, health alerts and food recalls, disease clusters, community needs assessments, and health risk assessments. In the latter case, public health informatics professionals seek to find innovative ways to leverage HIT to improve the way we govern by seeking ways to streamline processes that positively impact cost, quality, safety, and overall health outcomes. Figure 1 highlights these relationships in the context of public health practice domains.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. Public health informatics systems intelligence perspectives\n\n\n\nAlthough useful for fostering greater levels of adoption and use of technical measures, this technical view of public health informatics (1) does not highlight the changing knowledge needs of these system agents over time, (2) fails to capture the full array of interaction among agents in a dynamic environment, and (3) cannot maintain pace in adapting to an ever-increasing complex environment. In other words, the purely technical approach does not effectively highlight the full spectrum of knowledge, communication, and learning that is needed to keep all types of health system stakeholders \u2014 including individuals, organizations, or collections of individual and organizational networks (e.g., coalitions, collaborations, consortiums, and taskforces )\u2014 informed and able to respond to environmental changes at all stages of the healthcare continuum.\nIn this era of informatics where the emphasis is on less tangible cognitive capacities (e.g., learning health systems, intelligent and smart systems, and complex adaptive systems), a new public health informatics analytics approach may be required that is less information technology-driven and more knowledge-driven and defines new ways of demonstrating the added value of informatics in shaping health systems performance.[3] Specifically, stakeholders (hereafter referred to as individual- or organizational-level agents) need concise, accurate, and objective analytic measurements of abstract concepts, such as empowerment, which previously has been described as a function of knowledge for the purposes of achieving a quantifiable metric for computational analysis of performance.\nSuch a view of public health informatics may focus on abstract constructs like actionable intelligence as the primary informatics-centric outcome.[3] Such a strategy should yield objective operational measures and capabilities designed to ensure that individual agents, organizations, and networks have sufficient knowledge to mount an intelligent response to solve complex public health problems. In other words, the strategy should support development and maintenance of smart health systems, that is, a system that \"incorporates functions of sensing, actuation, and control in order to describe and analyze a situation, and make decisions based on the available data in a predictive or adaptive manner, thereby performing smart actions. In most cases the \u2018smartness\u2019 of the system can be attributed to autonomous operation based on closed loop control, energy efficiency, and networking capabilities.\"[4]\nThe purpose of this paper is to propose a set of measures for tracking the development and sustainability of smart public health systems. Specifically, we introduce the concepts of organizational complexity, problem\/issue complexity, and situational awareness as three codependent drivers of smart health systems. We then describe seven smart health systems measures. This discussion is important for public health informatics professionals responsible for specifying metrics, overseeing information systems housing data for the metrics, and evaluating the performance of smart public health systems.\n\nFactors shaping smart agents and organizations \nThe underlying objective of any agent or actor within a given public health system is to maximize the use of data, information, and knowledge as strategic resources. An informatics-biased view of a public health system focuses on the sum of data, information, knowledge systems, people, practices, policies, and cultural factors that operates to support some predefined intelligence strategy, organizational mission, or other event.[5] In these terms, the public health system can be understood as a functional knowledge culture. We have also used related terms such as knowledge environments, information or knowledge ecosystems, and information or knowledge ecologies to represent this idea. Here, we use knowledge culture and knowledge environment interchangeably. We argue that any defined system boundary that contains the formal or informal governance of critical strategic and shared knowledge resources can be called a knowledge environment. The primary purpose of any knowledge environment is best understood in terms of the essential need to leverage data, information, and knowledge in managing individual or collective uncertainty.[6][7][8]\nThe way in which we engage in information and knowledge seeking, organize ourselves into collectives of varying unit configurations (e.g., workgroups, project teams, taskforces, departments, divisions, networks of organizational coalitions, and consortiums), and\/or apply the use of tools or technology indicates the basic need to manage any and all forms of uncertainty.[9] We organize ourselves in response to external and internal drivers\/stressors and increasing environmental complexity as a means of reducing or removing any impediments toward fast, reliable, and pertinent data, information, and knowledge resources.[10] This imperative to organize for the sake of becoming smarter is best observed in our introduction of three primary drivers that we argue are interdependent in any knowledge environment. By describing these factors as interdependent, we are stating that as one type of driver category increases or decreases by some set of circumstances or events, corresponding changes can occur in one or both of the other areas. These areas include organizational complexity, problem\/issue complexity, and situational awareness (see Figure 2). Essentially, each of these three primary driver categories shapes our overall data, information, and knowledge strategy within any knowledge environment. The primary objective of an informatician in designing and maintaining a smart public health knowledge environment is then to understand the basic predictors of change in any or all of these categories, as well as to account for the corresponding mediation\/moderation factors that can shape continued data, information, and knowledge maximization for agents within any public health knowledge environment.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 2. Knowledge environment factors of influence\n\n\n\nOrganizational complexity factors shaping public health knowledge environments \nWe use a variety of organizational structures to facilitate interaction, communication, and knowledge representation in our quest to manage changes in our environment. Generally, the levels of organization may vary from a micro- to macrocontinuum that starts with organizational agents\/individuals, components\/sub-units, a single entity\/facility, and a multi-unit of systems\/collaborations\/coalitions\/networks\/taskforces\/consortiums.[11][12] Typically, the level of complexity inherent in the public health challenge or crisis event determines the corresponding level of organizational complexity required in the response.[7][13] Challenges or crisis events that are short-term or relatively minor may only require minimal, ad hoc, or temporary organizational responses. Within the modern healthcare environment, these can represent informal partnerships or formal structures appearing as short-term project teams or workgroups. More involved and long-term problems may require increasing levels of complexity within the organizational response. These long-term or complex organizational responses may be represented in the form of permanent departments or divisions within an organization, or they may even extend beyond organizational boundaries to include coalitions, collaborations, taskforces, and interagency network arrangements.\nOne common public health system problem-solving strategy used throughout the US and worldwide involves formulating networks of individuals and organizations to coordinate global-, national-, state-, regional-, county-, city-, or even community-level responses to health threats to individuals or populations. Such networks (e.g., coalitions, collaborations, consortiums, and taskforces) present opportunities to define common goals, shape strategy, achieve economies-of-scale through the sharing of resources and facilitate the centralized monitoring and measuring of progress toward stated objectives. However, one challenge for the public health informatics professional involves ensuring that the data, information, and knowledge needs of networks of stakeholders \u2014 ranging from patient advocates, health organizations, providers, community groups, public health departments, policy makers, and researchers \u2014 are all met with efficiency and effectiveness. The issues surrounding timely intelligence were on full display during the recent Ebola virus and Zika virus outbreaks.\nCurrently, there are no consistent measures or metrics to evaluate the efficiency and effectiveness of the ability of \u201csmart\u201d health networks\u2014of any size or configuration\u2014to leverage data, information, and knowledge to produce actionable intelligence from their efforts.[3][14] In other words, there is no quantifiable set of standardized measures or standard operational definitions of what a smart or learning health network is now or what it should be in the future.[15] Within any public health knowledge environment, a wide variety of network structures can be assumed. The organization is viewed as a dynamic, complex, and adaptive entity whose size, structure, and other organizational determinants must be constantly evaluated to promote its ability to respond to internal and external challenges, threats, and opportunities that will impact individuals and\/or the collective leveraging of actionable intelligence to ensure success in health system management.[16][17][18]\n\nProblem\/Issue complexity factors shaping knowledge environments \nAn analogy for problem\/issue identification and response within any public health knowledge environment is the human immune response in which the human immune system assesses threats on a constant basis and determines if a foreign agent is a \u201cfriend\u201d or \u201cfoe.\u201d Once identified in a healthy immune system, the proper immune response is triggered. For a friend response, facilitation\/proliferation strategies ensue, and, for a foe response, elimination\/mitigation strategies ensue. Two critical components in the overall immune response system are the ability to retain a memory of this encounter and to demonstrate system learning to prepare for future encounters of a relatively similar nature. \nThe same sort of dynamic occurs within a public health system network among its various organizational components, actors\/agents, and events. Once a phenomenon (i.e., circumstance\/event\/activity\/occurrence) is identified as a potential problem, either threatening or nonthreatening, system or collective memory is vetted for familiarity.[18] If sufficient memory of the phenomenon or something similar is found, the ideal response algorithm(s) (set of instructions) is\/are identified, outlining the appropriate response mechanism. If no memory exists, a response must be determined on an ad hoc basis. Clinical or public health events\/activities that allow favorable health outcomes (e.g., diffusion of best practices, strategic summits, introduction of new technology, disease screening and awareness campaigns, and new funding announcements) may be considered targets for facilitation\/proliferation, whereas unfavorable events\/activities (e.g., disease outbreaks, health or food recalls, medical errors, deviations from guideline concordant care, risk behaviors linked to disease spread, budget shortfalls, and staff layoffs) may be targets for elimination\/mitigation. \nIn either case, sufficient memory must be generated of the response algorithms (process\/workflows, policies\/procedures) that contributed to the event(s), pathways toward emergence, and\/or remediation strategy to eliminate the threat. Learning in this context presents the ability to circumnavigate potentially harmful events that have the potential for recurrence or the ability to repeat\/reinforce positive events that are beneficial.[19] Hence, the ability to extract actionable intelligence from stored memory is essential to overall public health system performance and an effective knowledge environment.[3] Two factors that shape this dynamic of event, memory evaluation, and learning within a knowledge environment are familiarity and preparedness, borrowed from the field of emergency preparedness.[20]\nWithin any knowledge environment, issues\/problem complexity and relative familiarity (stored memory) largely shape the level of \u201cshock\u201d or environmental stress to the public health system, which creates what Burton termed an organizational design misfit.[16] In the presence of an organizational design misfit, the goal is to seek to restore some measure of equilibrium.[17] The level of shock brought by the introduction of a problem\/issue into any public health knowledge environment and its corresponding impact on the public health system can be thought of in terms of two factors: (1) the degree to which the event was expected to occur and (2) the degree to which the environment was prepared for its occurrence. Figure 3 highlights the relationships of these two factors, where the green represents a highly desirable state of system and organizational readiness (operationally defined here as the agents\u2019 \u2014 within the public health knowledge environment \u2014 ability to process the event and determine an appropriate response), yellow represents less desirable states of organizational readiness, and red represents the least desirable state of organizational readiness and the highest level of vulnerability from both internal and external threats.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 3. Problem\/issue complexity factors\n\n\n\nAlthough most public health systems are prepared to deal with any event, some noticeable changes can occur in the face of uncovered vulnerabilities introduced by shock events. Such adjustments on the organizational side may present as unexpected leadership shifts, sudden changes in organizational command structures, abrupt shifts in policy and procedures, new strata of research funding to investigate and solve problems, or the addition or elimination of staff and key personnel.[16] On the public health knowledge environment side, such adjustments can take the form of wide-scale data integration or health information exchange efforts, the formation of new database solutions, the demand for new technology to monitor and track the problem, surveillance protocols, information systems, knowledge portals, decision-support systems, and changes in information resource-management protocols.[12] The level of complexity in both the problem\/issue and the capability of the public health knowledge environment to process the event and mount an appropriate response heavily shapes the level of organization (or in some cases reorganization) required to mediate the threat or exploit the opportunity. Additionally, these changes\u2014and more importantly the rate of changes in the organization in particular and the public health knowledge environment in general\u2014may serve as proxy indicators for overall public health knowledge environment maturity in managing uncertainty. In other words, a health system or public health agency that has undergone frequent leadership changes, high staff turnover, frequent redrafting of strategic plans, and reorganizations in a relatively short span of time serves as a strong indicator of the lack of overall public health knowledge environment maturity.[12] Such a public health knowledge environment characteristically remains in a loop moving from crisis-to-solution to a new or reemerging crisis-to-solution. In contrast, a mature public health knowledge environment will seek to identify and understand the patterns of organizational complexity and problem\/issue complexity emergence and response. Properly stored, organized, and readily accessible system memory can greatly aid in achieving a more mature public health knowledge environment.\n\nSituational awareness factors shaping information environments \nPreviously, we stated that organizational complexity is shaped by external or internal factors in a given public health knowledge environment, requiring different levels of formal or informal organizational structures to manage their environmental challenges. We also mentioned that the level of complexity inherent in problems\/issues and the corresponding system memory and preparedness will shape system-level responses to control and mitigate any perceived threats. Here, we formally define the term \"situational awareness\" (SA) as \u201cthe ability to make sense of an ambiguous situation. It is the process of creating [situational awareness] and understanding to support decision-making under uncertainty \u2014 an effort to understand connections among people, places, and events in order to anticipate their trajectories and act effectively.\u201d[21] Endsley elaborated on the definition for SA, stating that it's comprised of three sub-domains that shape individual understanding of some phenomena. These include (1) situation perception (defining the current public health condition), (2) situation comprehension (defining the relative public health threat or opportunity), and (3) situation projection (forecasting the public health outcomes of hypothesized trajectories).[22]\nWithin situational awareness, the two elements of organizational complexity and problem\/issue complexity are combined and serve as contributing factors that determine the degree to which organizational structure and function are properly suited to facilitate unencumbered information processing. Previous organizational theories have described the organization, functioning within a given environment, as an information-processing entity (IPE).[16] From this perspective, organizations are seen as sophisticated information processing and decision-making machines that act as if they have pre-programmed subroutines in managing the loop stages of information flow and organizational processes (model \u2192 input \u2192 transformation \u2192 output \u2192 feedback).[4] Within the IPE view of an organization, we must understand information processing as a means of shaping organizational and individual decisions, behaviors, and communication patterns.[18] The flow of information and knowledge is codependent on our constant need to learn and share knowledge, largely shaping the structure of our social and organizational network arrangements.[23] Therefore, the need to know or cognitive demand of both individuals and organizations becomes a primary driver of IPE activity.[18][23] \nInformation processing for the sake of storing mountains of data, information, and knowledge resources as an end itself is meaningless in the context of efficiency, effectiveness, and viability in meeting public health system organizational missions, goals, and objectives. More precisely, the primary function of any level of IPE \u2014 from simple department units to complex multiorganizational networks or health information exchanges \u2014 is to respond to what agents\/actors need to know, when they need to know it, and to support the choices\/decisions that must be made as shareholders navigate through the health system, defined here as actionable knowledge or intelligence.[3] The public health organizational IPE will seek to leverage SA to maximize readiness to meet public health threats from the environment and to maximize public health knowledge environment agent\/actor individual and\/or collective intelligence in the performance of core public health tasks and functions. Therefore, public health-centric SA serves as a comprehensive measure of public health system smartness and is essential for any standardized assessment of public health system performance from a public health informatics perspective.\n\nSmart systems vulnerability index \nIn this section, we propose seven smart health system measures and capabilities appropriate for helping to manage public health organizational complexity, problem\/issue\/complexity, and situational awareness for public health systems networks and public health knowledge environments. Figure 4 lists these seven measures and provides a brief description of a smart public health system and our rationale for its use. Although other measures may be available in the literature, we believe these seven effectively capture the key concepts discussed above. Of course, public health informatics professionals may need to use discretion when applying measures based on the context, the purpose of measurement, and any constraints hindering measurement.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 4. Smart health systems measures and capabilities\n\n\n\nKnowledge discovery rate (KDR) \nThomas Davenport described knowledge as data and information imbued with meaning and relevance. In this way, knowledge is seen as a continued aggregation and refinement that begins with raw data elements. For example, a set of 10 numerical digits constitutes raw and meaningless data. At the stage of information, it can be recognized as a telephone number.\nThese same digits can be viewed as knowledge when that number is contextualized as a conduit to satisfy some individual or organizational cognitive demand to support health decision-making or address some issue or problem. This telephone number can be viewed as a source of knowledge when, for example, it represents a nurse hotline for patient navigation. The informatics professional managing\/building a mature public health knowledge environment works with stakeholders to develop a comprehensive knowledge inventory (formerly referred to as an ontology) of all products used to inform public health stakeholder decision-making.[23]\nThe knowledge discovery rate (KDR) represents the rate at which knowledge is generated from new or existing data and information resources. In other words, the KDR shows how long it may take someone to (1) realize these 10 digits represent a telephone number, (2) process that this telephone number is connected to a patient navigating service, and (3) realize the nursing navigation service has additional resources to maximize the care experience. In public health, KDR may be expressed as the time it takes health officials to recognize a pattern in seemingly unrelated health events (e.g., ER traffic, school\/work absence, provider case reporting, news reports, food plant inspection reports, and grocery store sales), indicating a disease threat in the form of a potential food-borne illness. It should be noted that display or interface is an essential component as well. Endsley explores how the interface and display of information can greatly deter the uptake of knowledge and consequently impair overall situational awareness.[22] For example, will everyone read these representations in the same manner \u2014 \u201c1234567890\u201d and \u201c123.456.7890\u201d and \u201c(123) 456-7890\u201d \u2014 in their communication exchange?\nA key component in shaping the rate of knowledge discovery involves comprehensively assessing the presentation and display of data, information, and knowledge throughout key stages of any healthcare delivery (e.g., clinical pathway) or public health process. To that end, KDR involves understanding how knowledge is packaged for consumption in the form or paper or electronic tangible (explicit) knowledge products or as less tangible (tacit) knowledge products to inform decision-making. This measure examines the production curve of knowledge from generation, presentation, selection, and consumption, as well as the qualitative assessment of knowledge\u2019s relevance to agent-specific choice.\nKDR may be particularly pertinent in settings in which pattern recognition depends on a coordination of data, information, and knowledge from a highly heterogeneous network of sources and stakeholders. KDR becomes essential in public health situations where intelligence has to be collated across multiple agencies (e.g., school, hospital, retail, and corporate), wide geographic boundaries (e.g., multiple regional metropolitan area health departments), or multiple categories of stakeholders (e.g., patients, providers, and health administrators). In such cases, KDR represents a measures of timeliness and operates as a key indicator of public health outcomes.\n\nOrganizational (agent- or systems-level) memory \nEarlier we described the matrix of system shock as a function of preparedness and expectedness in response to internal and external stimuli (events). Organizational or systems-level memory can be described as the degree to which the history of these encounters, responses, and the relative degree of success or failure of those responses are catalogued and stored for future use by other agents\/actors in the future. This can be operationally understood as the \u201crepeatability\u201d level, commonly referred to as level two of five in the capability maturity model (CMM).[24] Systems memory simply asks to what degree are phenomena captured and labeled as favorable or unfavorable and response algorithms developed and made available for expedient consumption by the same and\/or other agents within the knowledge environment. A lack of repeatability represents a high level of unnecessary \u201cad hoc\u201d or CMM level-one responses[24] and may result in an inordinately high level of shock to the system for events that if properly catalogued could have been relegated to the realm of routine with minimal system-shock value.\nHere, the primary measure is to determine the level of completeness, sophistication, and use of knowledge-bases that represent the sum of public health knowledge stored for current and future public health decision-making. This can be expressed as basic knowledge inventories, resource guides, policy and procedure manuals, and intranet\/Internet lessons or best practices. It can also be expressed as highly sophisticated knowledge ontologies that capture and display public health knowledge, tasks, events, and procedures in complex electronic tools to support network modeling, information flows, and critical communication pathways. Public health knowledge portals can be constructed to identify public health stakeholder query demand more easily, as well as access, retrieve, display, and analyze knowledge use in any public health knowledge environment. This capability is essential to the proper use and maximization of organizational memory. In the absence of standardized knowledge memory management, a public health organization remains in a perpetual ad hoc response mode to each new or reoccurring public health crisis.\n\nAgent-specific and system learning \nThere is a growing body of literature of the evolution of a \u201clearning health system.\u201d[25] Our study contributes to this concept by providing a conceptual definition of both agent-specific and system-level learning from the perspective of a public health informatics professional managing\/building a public health knowledge environment. Here, learning is understood as the wisdom level of the informatics continuum.[26] We have refrained from using this concept throughout this discussion, but at this point it is appropriate to recognize that some informatics literature describes the informatics continuum (earlier referred to as the data progression) as data to information to knowledge to wisdom.[26] Typically, finding objective measures of wisdom is not easy or universally accepted. However, we have chosen to substitute wisdom for decision and outcomes. As a result, our data progression extends to the following sequence: data to information to knowledge to decisions to outcomes. The difference is that choices, when properly linked to specific outcomes and their corresponding consequences, provide opportunities for learning. As a result of this substitution, we are now able to define the concept of wisdom operationally as the degree to which choice \u2014 informed by relevant knowledge products \u2014 can lead to more highly desirable decisions, beneficial outcomes, and positive consequences for the overall health and well-being of agents and the system.\nWe extend our definition of wisdom to incorporate intelligence, simply understood as the display of wisdom over time. In our model, learning acts as a measure of differential wisdom and intelligence over time (the difference measured at two distinct points in time). In other words, this equation involves individual or organizational wisdom displayed or measured at some endpoint (t2) minus the individual or organizational wisdom displayed at some starting point (t1). The organizational IQ in a learning health system is then understood as the measure of differential wisdom displayed over time toward some set of decisions\/choices, actions\/tasks, or other health phenomena. Learning represents a measurement of agent-specific or system-level discernment (the ability to leverage situational awareness in comprehending threat level, as well as leveraging stored or new knowledge in choosing between differing options). To this end, learning is construed as the means of refinement in the art of discernment or wisdom acquisition.\nIn public health terms, the operational construct of this measure of learning is still evolving, and little literature exists on applying this construct in public health practice. We suggest that measures of learning\u2014presented here as a means of leveraging knowledge resources in a wise manner\u2014are largely dependent on the previous measures of organizational memory. In the absence of a well-designed public health knowledge-base that captures history or practice, learning from such experience becomes extremely episodic and anecdotal in nature. For example, we can only speculate on how much stored memory has been gathered with respect to the Ebola crisis that may more easily mitigate another outbreak or similar outbreaks of other diseases in related conditions. The emergence of rapid learning health networks deals with some aspects of this challenge by streamlining the processing of research evidence into practice and gathering knowledge stores of what works best in achieving better health outcomes. However, global implementations of these research-to-practice and comparative effectiveness networks are still in the early stages of development.\n\nKnowledge absorption rate (KAR) \nCarley et al. described how the sum of knowledge within a given system boundary can be quantified in terms of knowledge bits.[27][28][29] According to this concept, knowledge represented in its various forms can be deconstructed into quantifiable units.[29][30] The number of knowledge units or bits that may comprise a discreet package of knowledge is determined by the level of complexity of the decisions or tasks this knowledge is designed to inform.[29][30] As such, a direct relationship exists between the number of knowledge bits and the level of the complexity in related decisions and tasks. The greater the level of task or decision complexity\/criticality or decision, the larger the knowledge complement (or number of knowledge bits) associated with the management, storage, display\/representation, diffusion, use, and comprehension of knowledge.[29][30] This perspective assumes more knowledge bits are needed to saturate or carry out a complex task or make a critical choice than to implement a more simplified\/less complex task or choice.\nIn essence, this concept of knowledge bits suggests that throughout the knowledge environment, agents\/actors can have either no saturation of knowledge (or 0 bits) up to 100 percent saturation of knowledge (or all bits available in the knowledge environment). The consumption rate or absorption of these knowledge bits over time, in the performance of core task performance, can then be evaluated using a variety of statistical and computational modeling methods. To carry this out, a value or weight is assigned to every piece of knowledge represented within a knowledge inventory (also referred to as ontology). The weight given to a knowledge product represents both the degree of value assigned by consumers of that knowledge product (elasticity-of-demand) and the magnitude of importance of the respective decision(s) it is intended to inform (criticality). The curve of a knowledge product\u2019s elasticity-of-demand and criticality of decisions is evaluated in the context of a core set of tasks to be performed at the agent or system level.\nIn public health settings, KAR represents a concrete way of measuring overall application of knowledge to performance. In previous studies, we examined the knowledge absorption rate of community health clinical staff with regard to breast, cervical, and colorectal cancer screening policies, guidelines, and protocols as derived from the use of electronic clinical decision-support (CDS).[31] We examined the extent to which CDS use and corresponding knowledge absorption rates would be correlated to organizational performance for cancer screening.[31] We demonstrated that KAR was, in fact, a predictor of organizational performance in meeting process-of-care outcomes in cancer care.[31] Hence, we suggest that KAR can serve as an effective measure of HIT impact on performance by focusing on end-users\u2019 ability to access key knowledge by interacting with HIT tools and applying this knowledge to healthcare delivery and public health practice.\n\nAgent-specific and system-level cognitive demand \nWithin any given knowledge environment, agents\/actors at many levels perform key tasks, make decisions, and engage in a series of activities that can be described by a set of process algorithms.[28] The constant factor governing this activity is the principle of a cognitive demand for information. The principle of supply and demand borrowed from the field of economics applies somewhat to the field of informatics with respect to agents'\/actors' need for information to support decision-making and task performance. Here, we focus on the metric as a measure of the relative demand for data, information, and knowledge resources by agents\/actors operating at all levels of the multilevel model, as well as the corresponding supply of data, information, and knowledge resources available for consumption. We refer to main driver of the interplay between the supply and demand of data, information, and knowledge resources as the cognitive demand or simply the \u201cneed to know.\u201d\nThis \u201cneed to know\u201d or cognitive demand shapes information-seeking behaviors of the agents\/actors within the system and may govern the amount effort they are willing to expend in acquiring the data, information, or knowledge resources. The level of importance or criticality of information to the agent is measured by the elasticity-of-demand (a borrowed term) for that information. The measure of elasticity coupled with the relative supply of information can be used to measure relative states of \u201cinformedness\u201d of the agents\/actors within the system. According to the formal definition of elasticity, in an elastic demand, the change in quantity demanded due to a change in price is large.[32] In contrast, an inelastic demand is one in which the change in quantity demanded due to a change in price is small.[32] Cognitive demand can serve as a core measure in identifying knowledge-related vulnerabilities within a system or the relative degree to which the cost of the knowledge required is acceptable or not acceptable.\nWe understand that in the context of health systems, the concept of price with respect to knowledge can be measured in terms of access, affordability (time and effort), overall opportunity-cost (ease-of-use, processing, comprehension, and understanding), and relevancy. Figure 5 lists that four distinct states agents\/actors can assume within any public health knowledge environment based on the level of criticality (elasticity-of-demand) and the supply of knowledge. When the cognitive demand for knowledge is highly critical and the relative supply is limited, knowledge gaps emerge. Such knowledge gaps may result from a variety of scenarios, including (1) the information or knowledge product does not exist, resulting in a need for innovation; (2) the resource exists, but access is in some way limited or encumbered; (3) the resource exists with abundant access but is not easily processed or consumed because of literacy challenges, content presentation, or other reasons; and (4) the supply is challenged by other competing priorities and is intentionally undeveloped or underdeveloped. The two states we refer to as parity conditions represent areas where the level of criticality is adequately met by the level of knowledge supply. In such cases, the main strategy is to employ continuous monitoring to ensure balances remain within desired ranges of acceptability in conjunction with the need for balance in the overall public knowledge environment. A state of knowledge surplus results when the level of information or knowledge product supply exceeds the relative level of importance placed on the information or knowledge products (also understood as relevancy). This state represents an opportunity for the elimination of outmoded or underused knowledge resources, information system redesign\/upgrade, or other information technology strategic efforts to ensure long-term relevance of information resources, information systems, and knowledge products.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 5. Criticality (elasticity-of-demand) and the supply of knowledge\n\n\n\nThe public health application of this measure of relative cognitive demand or the need to know is illustrated by the recent Zika outbreak. It became evident that the population at greatest risk for the disease was pregnant or soon-to-be pregnant women. The demand for information regarding protective measures, travel restrictions, the rate of transmission, the relative threat to fetal health, and signs and symptoms once infected created large pockets of health consumer uncertainty, stress, and anxiety. Given the severe level of risk to pregnant women and their developing babies, the demand for knowledge of what to do for protection was highly critical. The window of transmission of the Zika virus from mother to fetus was highly uncertain, the effectiveness of preventive measures was hard to measure, and travel decisions to affected areas were rather unclear, resulting in conditions of highly critical and low-resourced knowledge profiles for many health consumers and stakeholders at all levels. Rapid research was needed to identify proven measures against the invading mosquito population. Public health departments throughout the affected areas were scrambling to model the spread of the disease, measure the impact of the preventive measures, and manage the reports of news cases. Meanwhile, the public was constantly demanding new answers and updates on a daily basis. This was compounded by the timing of the 2016 Summer Olympic games in Rio that sparked highly publicized athletes refusing to travel to the region to participate in the event. Highly critical\/low supply-resourced conditions are probably the most difficult to manage. In any public health knowledge environment, a continual assessment of stakeholder cognitive demand must be done\u2014relative to the capabilities of the existing or evolving knowledge-base\u2014as a means of satisfying current and\/or projecting anticipated demands.\n\nCognitive mapping \nOnce the knowledge inventory and relative measures of importance are assessed and the corresponding process and information flows have been identified, the public health informatics professional can now engage in the process of creating cognitive maps or models of both existing and emerging knowledge and communication pathways. These pathways can be modeled for specific agents, for the system as a whole, or any combination of the two. Here, the public health informatics professional is not simply asking who uses what information or examining the use of computerized information resources; instead, the goal involves trying to model the cycle of information and knowledge within the public health knowledge environment. This information cycle is best understood as starting with raw materials, in this case raw and at times unformatted data elements, which are assembled into chunks of information (e.g., electronic databases or information systems). These information chunks are either coordinated in the formation of meaningful knowledge products or presented to users of information to coordinate based on their specific needs (structured queries), which can be thought of as off-the-shelf knowledge products or ad hoc user-defined knowledge products to support decision-making (ad hoc queries). We refer to this cycle as knowledge refinery.\nAnalytic measures of knowledge refinement consist of examining the pace of knowledge development and exploring system responsiveness as expressed by the supply of and demand for data, information, and knowledge resources.[33] The basic elements of analysis consist of the total knowledge in any given public health knowledge environment (knowledge entropy) relative to the amounts of used kinetic knowledge and unused potential knowledge.[33] This can also be expressed in terms of the amount of knowledge\/information gained or loss in an effort to maximize performance.[33] Additionally, the public health informatics professional could examine existing and emerging pathways that are developed through the examination of patterns of use, which is closely linked to the concept of plasticity.[34] In the field of neuroscience, the term \"neuroplasticity\" refers to the human brain\u2019s ability to change in response to behavioral, environmental, and neural processes.[35] In the human brain, these pathways, after repeated stimulation and reinforcement, are actually carved into the brain tissue.[35] Like neural pathways are carved into the human brain, IPEs, as described earlier, may examine how public health knowledge environments respond to changes in behavior, environmental conditions, or agent-specific or system-level cognitive demands.[34][36]\nPathways of changes in public health knowledge environment cognition can be modeled using a variety of conceptual and visualization techniques.[37] Such pathways, when observed and modeled, can yield repeated patterns, which may be canonized as permanent or semi-permanent cognitive pathways toward system-level knowledge and learning health systems.[37] Within our model of public health knowledge environments, highly intelligent health systems have the ability to manage such cognitive pathways in response to cognitive demands.[37] Where old or unused pathways exist, data and information systems (and the corresponding knowledge products) will likely be considered outdated or not useful. Where current cognitive pathways are robust and frequented, data and information systems are likely to be considered essential to decision-making, and where new and emerging cognitive pathways are observed or predicted, the likelihood exists for innovation and systems development to support emerging communities of practice, workgroups, department\/divisions, and other formal or informal organizational structures.[38]\nThe public health application lies in understanding the public health system as an evolving complex network of individuals, organizations, groups, and knowledge resources. Here, the public health informatics professional may find knowledge, skills, and abilities in modeling social networks and organizational networks that are essential in establishing current state network diagrams (baseline) and future state diagrams designed to guide the visualization of a public health knowledge environment. In this context, a large library of network measures can be employed to support the analysis of a public health system and its respective knowledge environment, like measures of network density agents, closeness\/connectedness of agents to each other or to other knowledge resources, patterns of clustering and cliques behaviors, knowledge-sharing practices, and more. The application of social and organizational network analysis in public health is growing at a rapid rate and is expected to continue moving forward.\n\nAberrant detection analytics \nArguably the most important analysis within this discussion involves being able to detect subtle changes within the public health knowledge environment that may pose a threat to one or more agents or the system overall. Here, we discuss the ability of intelligent analytics or telemetry as part of a Public Health Situation Room that can be used to detect subtle changes within the public health knowledge environment. The public health informatics professional relies heavily on the use of probes and sensors as part of any surveillance and monitoring system to gather intelligence. Similarly, physicians and nurses rely on telemetry to monitor patient vital signs, drivers use dashboards to detect changes in automobile status, and investors use tickers to track global investments. These forms of monitoring and tracking systems have one feature in common: they all make use of a grid system or network of core indicators as validated predictors of overall system health.\nAny sensor network used to monitor and track activity within a given public health Public Health Situation Room must recognize several information-specific concepts. First, information does not always travel along predefined organizational departmental or process pathways. Instead, information exchange may occur along a multiplicity of pathways, some predictable and others highly unpredictable. While organizational constructs of departments and divisions may account for some of communication and information exchange, they do not account for all activity. Therefore, the placement of public health data\/event collecting sensors within a given public health knowledge environment must be a fluid network that is highly adaptive and capable of capturing activity in different settings, wherever the information channels may lead.\nSecond, the Public Health Situation Room sensor grid must be able to identify and track activity by both internal agents and system components, as well as by external agents and system components that may interact with the environment. No public health knowledge environment is completely closed. As a result, sensors must capture intelligence from portals through which information travels in and out of the system in all forms. Finally, the level of completeness must be defined to determine what represents an adequate level of coverage. A poorly designed or partial Public Health Situation Room sensor grid that allows large levels of undetected activity would not be useful on long-term. A sensor grid should be viewed as a living and growing network of data\/event collection activities that changes with evolving needs and priorities, and the level of granularity or specificity of detection must also be capable of changing within the grid as strategic priorities shift. The full list of indicators drawn from public health knowledge environment factors, stakeholder-levels, and unique views will largely shape the types of sensors, density of the network, and level of sensitivity needed for meaningful aberrant detection algorithms and monitoring system development.\nPublic Health Situation Rooms are used at all levels of the public health system throughout national and international health settings, the U.S. Department of Health and Human Services, various public health agencies, and healthcare delivery settings across the globe. However, there is no standardized model for this type of monitoring capability. Public Health Situation Room needs and priorities vary widely by organization and may include but not be limited to disease management, outbreaks investigation, emergency preparedness, disaster response, community health assessments, and even healthcare access, equity, and quality. The use of electronic performance, strategic, operational, and clinical dashboards are typical of such Public Health Situation Rooms. We argue the primary challenge of the public health informatics professional in the design and execution of Public Health Situation Rooms is to develop the underlying smart infrastructure (knowledge-base) and array of analytic measures described in this discussion to ensure the maximum impact on desired outcomes.\n\nConclusion \nAs we enter a public health informatics era with terms like \"learning health systems,\" \"smart health systems,\" and \"adaptive complex health systems,\" we must identify a common set of analytic measures and capabilities to inform our modeling, measuring, and managing of public health \"smartness.\" Such a set of measures must take into account the full spectrum of sociotechnical factors that make up a public health system and shape performance, including technical, organizational, and human contributions. It is essential that we understand the basic drivers of smart systems, expressed in this discussion as simply the need to know or cognitive demand. This basic need to know and our corresponding effort to leverage data, information, and knowledge resources toward some individual or collective set of goals and objectives form the basic parameters of any smart system. In the context of a public health system, public health informatics professionals stand poised to redefine the benefit of smarter healthcare delivery and public health practice. A common set of analytic measures and capabilities that can drive efficiency and viable models can demonstrate how incremental changes in smartness generate corresponding changes in public health performance. Here, we introduced the concepts of organizational complexity, problem\/issue complexity, and situational awareness as three codependent drivers of smart public health systems characteristics. We also propose seven smart health systems measures and capabilities that are considered essential in a public health informatics professional's toolkit. Because this area of research and practice is still in its formative stages, the intent of this discussion is to build on the developing body of literature seeking to establish standardized measures for smart, learning, and adaptive public health systems.\n\nDisclosure \nThis work represents the opinion of the author and cannot be construed to represent the opinion of the U.S. Federal Government. Timothy Jay Carney is the founding partner of the Global Health Equity Intelligence Collaborative, LLC, Durham, NC (2014).\n\nCompeting interests \nThe authors declare that they have no conflicts of interests.\n\nAuthors' contributions \nTimothy Jay Carney participated in the conceptual model development, design, and literature review of the manuscript. Christopher Michael Shea contributed to the review and substantial redesign of the manuscript for resubmission.\n\nAcknowledgments \nThe manuscript is supported by The University of North Carolina Gillings School of Public Health and The Lineberger Comprehensive Cancer Center and The Carolina Community Network Center to Reduce Cancer Health Disparities Diversity Supplement 3U54CA153602. Special thanks are due to Hannah M.L., Elance\/Upwork consultant, for outstanding editing, content review, and consultation on manuscript development.\n\nReferences \n\n\n\u2191 Kukafka, A.; Yasnoff, W.A. (2007). \"Public health informatics\". Journal of Biomedical Informatics 40 (4): 365\u2013369. doi:10.1016\/j.jbi.2007.07.005. PMID 17656158.   \n\n\u2191 Yasnoff, W.A.; O'Carroll, P.W.; Koo, D. et al. (2000). \"Public health informatics: Improving and transforming public health in the information age\". Journal of Public Health Management and Practice 6 (6): 67\u201375. PMID 18019962.   \n\n\u2191 3.0 3.1 3.2 3.3 3.4 Hsu, C.E.; Chambers, W.C.; Herbold, J.R. et al. (2010). \"Towards shared situational awareness and actionable knowledge \u2014 an enhanced, human-centered paradigm for public health information system design\". Journal of Homeland Security and Emergency Management 7 (1). doi:10.2202\/1547-7355.1727.   \n\n\u2191 4.0 4.1 March, J.G.; Simon, H.A. (1993). Organizations (2nd ed.). Wiley-Blackwell. ISBN 9780631186311.   \n\n\u2191 Davenport, T.H.; Prusak, L. (1997). Information Ecology: Mastering the Information and Knowledge Environment (1st ed.). Oxford University Press. ISBN 9780195111682.   \n\n\u2191 Diez Roux, A.V. (2011). \"Complex systems thinking and current impasses in health disparities research\". American Journal of Public Health 101 (9): 1627\u201334. doi:10.2105\/AJPH.2011.300149. PMC PMC3154209. PMID 21778505. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3154209 .   \n\n\u2191 7.0 7.1 Lich, K.H.; Ginexi, E.M.; Osgood, N.D.; Mabry, P.L. (2013). \"A call to address complexity in prevention science research\". Prevention Science 14 (3): 279-89. doi:10.1007\/s11121-012-0285-2. PMID 22983746.   \n\n\u2191 Arndt, M.; Bigelow, B. (2000). \"Commentary: the potential of chaos theory and complexity theory for health services management\". Health Care Management Review 25 (1): 35\u20138. PMID 10710726.   \n\n\u2191 Kling, R. (1993). \"Organizational analysis in computer science\". The Information Society 9 (2): 71\u201387. doi:10.1080\/01972243.1993.9960134.   \n\n\u2191 Bandura, A. (1976). Social Learning Theory (1st ed.). Prentice-Hall. ISBN 9780138167448.   \n\n\u2191 Kling, R.; Rosenbaum, H.; Sawyer, S. (2005). Understanding and Communicating Social Informatics: A Framework for Studying and Teaching the Human Contexts of Information and Communication Technologies. Information Today, Inc. pp. 241. ISBN 9781573872287.   \n\n\u2191 12.0 12.1 12.2 Lorenzi, N.M.; Riley, R.T. (1995). Organizational Aspects of Health Informatics. Springer New York. ISBN 9781475741841.   \n\n\u2191 Thi\u00e9tart, R.A.; Forgues, B. (1995). \"Chaos Theory and Organization\". Organization Science 6 (1): 19\u201331. doi:10.1287\/orsc.6.1.19.   \n\n\u2191 U.S. Government Accountability Office (17 December 2010). \"Public Health Information Technology: Additional Strategic Planning Needed to Guide HHS's Efforts to Establish Electronic Situational Awareness Capabilities\". pp. 49. http:\/\/www.gao.gov\/products\/GAO-11-99 .   \n\n\u2191 Grossmann, C.; Goolsby, W.A.; Olsen, L.A. (2011). Engineering a Learning Healthcare System: A Look at the Future: Workshop Summary. National Academies Press. pp. 340. ISBN 9780309120654.   \n\n\u2191 16.0 16.1 16.2 16.3 Burton, R.M.; Obel, B. (2004). \"The Dynamics of the Change Process\". Strategic Organizational Diagnosis and Design. 4. Springer U.S.. pp. 385-420. ISBN 9781441991140.   \n\n\u2191 17.0 17.1 Nonaka, I. (1994). \"A dynamic theory of organizational knowledge creation\". Organizational Science 5 (1): 14\u201337. doi:10.1287\/orsc.5.1.14.   \n\n\u2191 18.0 18.1 18.2 18.3 Popper, M.; Lipshitz, R. (1998). \"Organizational learning mechanisms: A structural and cultural approach to organizational learning\". Journal of Applied Behavioral Science 34 (2): 161\u2013179. doi:10.1177\/0021886398342003.   \n\n\u2191 Crossan, M.M.; Lane, H.W.; White, R.E. (1999). \"An organizational learning framework: From intuition to institution\". The Academy of Management Review 24 (3): 522-537. doi:10.5465\/AMR.1999.2202135.   \n\n\u2191 Warren, L.; Fuller, T. (2011). \"Contrasting Approaches to Preparedness: A Reflection on Two Case Studies\". Managing Adaptability, Intervention, and People in Enterprise Information Systems. IGI Global. pp. 18\u201334. doi:10.4018\/978-1-60960-529-2.ch002. ISBN 9781609605292.   \n\n\u2191 Klein, G.; Moon, B.; Hoffman, R.R. (2006). \"Making sense of Sensemaking 1: Alternative perspectives\". IEEE Intelligent Systems 21 (4): 70\u201373. doi:10.1109\/MIS.2006.75.   \n\n\u2191 22.0 22.1 Endsley, M.R.; Jones, D.G. (2011). Designing for Situation Awareness: An Approach to User-Centered Design (2nd ed.). CRC Press. pp. 396. ISBN 9781420063554.   \n\n\u2191 23.0 23.1 23.2 Bowen, S. (2012). Ontology (Knowledge Representation in Information Science). Ocean Media. ISBN 9788132330912.   \n\n\u2191 24.0 24.1 Dymond, K.M. (1995). A Guide to the Cmm: Understanding the Capability Maturity Model for Software. Process Inc. U.S.. ISBN 9780964600805.   \n\n\u2191 Friedman, C.P.; Wong, A.K.; Blumenthal, D. (2010). \"Achieving a nationwide learning health system\". Science Translational Medicine 2 (57): 57cm29. doi:10.1126\/scitranslmed.3001456. PMID 21068440.   \n\n\u2191 26.0 26.1 Smith, P.F.; Ross, D.A. (2012). \"Information, knowledge, and wisdom in public health surveillance\". Journal of Public Health Management and Practice 18 (3): 193\u201395. doi:10.1097\/PHH.0b013e318250b064. PMID 22473109.   \n\n\u2191 Carley, K. (1998). \"Adaptive organizations and emergent forms\". Proceedings of the 3rd International Conference on Multi Agent Systems 1998. doi:10.1109\/ICMAS.1998.699020.   \n\n\u2191 28.0 28.1 Krackhardt, D.; Carley, K.M. (1998). \"A PCANS model of structure in organizations\". Proceedings of the 1998 International Symposium on Command and Control Research and Technology 1998: 113\u2013119. doi:10.1109\/ICMAS.1998.699020.   \n\n\u2191 29.0 29.1 29.2 29.3 Schreiber, C.; Singh, S.; Carley, K.M. (May 2004). \"Construct\u2014A Multi-Agent Network Model for the Co-Evolution of Agents and Socio-Cultural Environments\". Carnegie Mellon University. http:\/\/handle.dtic.mil\/100.2\/ADA460028 .   \n\n\u2191 30.0 30.1 30.2 Hirshman, B.R.; Carley, K.M.; Kowalchuck, M.J. (25 July 2007). \"Specifying Agents in Construct\". Carnegie Mellon University. http:\/\/handle.dtic.mil\/100.2\/ADA500804 .   \n\n\u2191 31.0 31.1 31.2 Carney, T.J.; Morgan, G.P.; Jones, J. et al. (2014). Using computational modeling to assess the impact of clinical decision support on cancer screening improvement strategies within the community health centers. 51. pp. 200\u20139. doi:10.1016\/j.jbi.2014.05.012. PMC PMC4194243. PMID 24953241. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4194243 .   \n\n\u2191 32.0 32.1 \"Price elasticity of demand\". Economics Online. Economics Online Ltd. http:\/\/www.economicsonline.co.uk\/Competitive_markets\/Price_elasticity_of_demand.html .   \n\n\u2191 33.0 33.1 33.2 Golan, A.; Maasoumi, E. (2008). \"Information theoretic and entropy methods: An overview\". Econometric Reviews 27 (4\u20136): 317\u2013328. doi:10.1080\/07474930801959685.   \n\n\u2191 34.0 34.1 Scarborough, D. (2007). Neural Networks in Organizational Research: Applying Pattern Recognition to the Analysis of Organizational Behavior. American Psychological Association. pp. 187. ISBN 9781591474159.   \n\n\u2191 35.0 35.1 \"Definition of Neuroplasticity\". MedicineNet. MedicineNet, Inc. 14 June 2012. http:\/\/www.medicinenet.com\/script\/main\/art.asp?articlekey=40362 .   \n\n\u2191 DARPA\/TTO (22 March 1989). DARPA Neural Network Study: Final Report. Lincoln Laboratory, Massachusetts Institute of Technology. https:\/\/catalog.hathitrust.org\/Record\/009752973 .   \n\n\u2191 37.0 37.1 37.2 Hammad, T. (1998). \"Computational intelligence: Neural networks methodology for health decision support\". In Tan, J.K.H.; Sheps, S.. Health Decision Support Systems. Aspen Publishers, Inc. ISBN 0834210657.   \n\n\u2191 Carley, K. (2002). \"Smart agents and organizations of the future\". In Lievrouw, L.A.; Livingstone, S.. Handbook of New Media: Social Shaping and Consequences of ICTs. SAGE Publications. pp. 206\u2013220. ISBN 9780761965107.   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. A few grammar and spelling errors were also corrected.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\">https:\/\/www.limswiki.org\/index.php\/Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2017)LIMSwiki journal articles (all)LIMSwiki journal articles on public health informatics\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 11 January 2017, at 23:42.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 1,096 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","bfe42513d857c82a22a78dbd758fc186_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach_Information_science_perspective skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Informatics metrics and measures for a smart public health systems approach: Information science perspective<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p><a href=\"https:\/\/www.limswiki.org\/index.php\/Public_health_informatics\" title=\"Public health informatics\" target=\"_blank\" class=\"wiki-link\" data-key=\"f0372a80f101e9f6fd00490dc1ebcedd\">Public health informatics<\/a> is an evolving domain in which practices constantly change to meet the demands of a highly complex public health and healthcare delivery system. Given the emergence of various concepts, such as learning health systems, smart health systems, and adaptive complex health systems, <a href=\"https:\/\/www.limswiki.org\/index.php\/Health_informatics\" title=\"Health informatics\" target=\"_blank\" class=\"wiki-link\" data-key=\"055eb51f53cfdbacc08ed150b266c9f4\">health informatics<\/a> professionals would benefit from a common set of measures and capabilities to inform our modeling, measuring, and managing of health system \u201csmartness.\u201d Here, we introduce the concepts of organizational complexity, problem\/issue complexity, and situational awareness as three codependent drivers of smart public health systems characteristics. We also propose seven smart public health systems measures and capabilities that are important in a public health informatics professional\u2019s toolkit.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>Public health informatics is an evolving domain in which practices constantly change to meet the demands of a highly complex public health and healthcare delivery system. The typical definition for a variety of domains of <a href=\"https:\/\/www.limswiki.org\/index.php\/Informatics\" title=\"Informatics\" class=\"mw-disambig wiki-link\" target=\"_blank\" data-key=\"ea0ff624ac3a644c35d2b51d39047bdf\">informatics<\/a> (e.g., public health, population health, nursing, clinical, medical, health, consumer, and biomedical) centers on the \"application of information science and information technology to [a specific domain of] practice, research, and training.\"<sup id=\"rdp-ebb-cite_ref-KukafkaPublic07_1-0\" class=\"reference\"><a href=\"#cite_note-KukafkaPublic07-1\" rel=\"external_link\">[1]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-YasnoffPublic00_2-0\" class=\"reference\"><a href=\"#cite_note-YasnoffPublic00-2\" rel=\"external_link\">[2]<\/a><\/sup> This definition of informatics relies on a technical view of the health system. A technical view of informatics largely identifies more tangible products such as databases, decision-support tools, information systems, web portals, and mobile devices as the primary means of addressing complex health issues, improving care, and reducing health disparities.\n<\/p><p>Public health informatics systems expressed as a function of intelligence can be understood in terms of two codependent pathways of (1) generating <a href=\"https:\/\/www.limswiki.org\/index.php\/Health_information_technology\" title=\"Health information technology\" target=\"_blank\" class=\"wiki-link\" data-key=\"9c8ef822470559f757db89f3fa234cc0\">health information technology<\/a> (HIT) policies that ensure our ability to govern intelligence as a byproduct and (2) allowing innovations in HIT to shape and inform public health systems policy and practice to ensure that we govern intelligently. In the former case, public health informatics professionals endeavor to generate HIT policy to guide national, state, and local information architecture, information infrastructure, and information integration efforts that ultimately guide how public health meets the needs of stakeholder\/agents such as patients\/families\/health consumers, communities, providers\/healthcare organizations, researchers, policymakers, and disease-centric communities of practice through the meaningful supply of intelligence. Such intelligence can inform stakeholder understanding about the burden of disease, spread of an outbreak, health alerts and food recalls, disease clusters, community needs assessments, and health risk assessments. In the latter case, public health informatics professionals seek to find innovative ways to leverage HIT to improve the way we govern by seeking ways to streamline processes that positively impact cost, quality, safety, and overall health outcomes. Figure 1 highlights these relationships in the context of public health practice domains.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Carney_CompMathMethMed2017.png\" class=\"image wiki-link\" target=\"_blank\" data-key=\"c95223f5ceee06199de0604e8fee13e1\"><img alt=\"Fig1 Carney CompMathMethMed2017.png\" src=\"https:\/\/www.limswiki.org\/images\/d\/d5\/Fig1_Carney_CompMathMethMed2017.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> Public health informatics systems intelligence perspectives<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Although useful for fostering greater levels of adoption and use of technical measures, this technical view of public health informatics (1) does not highlight the changing knowledge needs of these system agents over time, (2) fails to capture the full array of interaction among agents in a dynamic environment, and (3) cannot maintain pace in adapting to an ever-increasing complex environment. In other words, the purely technical approach does not effectively highlight the full spectrum of knowledge, communication, and learning that is needed to keep all types of health system stakeholders \u2014 including individuals, organizations, or collections of individual and organizational networks (e.g., coalitions, collaborations, consortiums, and taskforces )\u2014 informed and able to respond to environmental changes at all stages of the healthcare continuum.\n<\/p><p>In this era of informatics where the emphasis is on less tangible cognitive capacities (e.g., learning health systems, intelligent and smart systems, and complex adaptive systems), a new public health informatics analytics approach may be required that is less information technology-driven and more knowledge-driven and defines new ways of demonstrating the added value of informatics in shaping health systems performance.<sup id=\"rdp-ebb-cite_ref-HsuTowards10_3-0\" class=\"reference\"><a href=\"#cite_note-HsuTowards10-3\" rel=\"external_link\">[3]<\/a><\/sup> Specifically, stakeholders (hereafter referred to as individual- or organizational-level agents) need concise, accurate, and objective analytic measurements of abstract concepts, such as empowerment, which previously has been described as a function of knowledge for the purposes of achieving a quantifiable metric for computational analysis of performance.\n<\/p><p>Such a view of public health informatics may focus on abstract constructs like actionable intelligence as the primary informatics-centric outcome.<sup id=\"rdp-ebb-cite_ref-HsuTowards10_3-1\" class=\"reference\"><a href=\"#cite_note-HsuTowards10-3\" rel=\"external_link\">[3]<\/a><\/sup> Such a strategy should yield objective operational measures and capabilities designed to ensure that individual agents, organizations, and networks have sufficient knowledge to mount an intelligent response to solve complex public health problems. In other words, the strategy should support development and maintenance of smart health systems, that is, a system that \"incorporates functions of sensing, actuation, and control in order to describe and analyze a situation, and make decisions based on the available data in a predictive or adaptive manner, thereby performing smart actions. In most cases the \u2018smartness\u2019 of the system can be attributed to autonomous operation based on closed loop control, energy efficiency, and networking capabilities.\"<sup id=\"rdp-ebb-cite_ref-MarchOrg93_4-0\" class=\"reference\"><a href=\"#cite_note-MarchOrg93-4\" rel=\"external_link\">[4]<\/a><\/sup>\n<\/p><p>The purpose of this paper is to propose a set of measures for tracking the development and sustainability of smart public health systems. Specifically, we introduce the concepts of organizational complexity, problem\/issue complexity, and situational awareness as three codependent drivers of smart health systems. We then describe seven smart health systems measures. This discussion is important for public health informatics professionals responsible for specifying metrics, overseeing information systems housing data for the metrics, and evaluating the performance of smart public health systems.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Factors_shaping_smart_agents_and_organizations\">Factors shaping smart agents and organizations<\/span><\/h2>\n<p>The underlying objective of any agent or actor within a given public health system is to maximize the use of data, <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" target=\"_blank\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a>, and knowledge as strategic resources. An informatics-biased view of a public health system focuses on the sum of data, information, knowledge systems, people, practices, policies, and cultural factors that operates to support some predefined intelligence strategy, organizational mission, or other event.<sup id=\"rdp-ebb-cite_ref-DavenportInfo97_5-0\" class=\"reference\"><a href=\"#cite_note-DavenportInfo97-5\" rel=\"external_link\">[5]<\/a><\/sup> In these terms, the public health system can be understood as a functional knowledge culture. We have also used related terms such as knowledge environments, information or knowledge ecosystems, and information or knowledge ecologies to represent this idea. Here, we use knowledge culture and knowledge environment interchangeably. We argue that any defined system boundary that contains the formal or informal governance of critical strategic and shared knowledge resources can be called a knowledge environment. The primary purpose of any knowledge environment is best understood in terms of the essential need to leverage data, information, and knowledge in managing individual or collective uncertainty.<sup id=\"rdp-ebb-cite_ref-DiezRouxComp11_6-0\" class=\"reference\"><a href=\"#cite_note-DiezRouxComp11-6\" rel=\"external_link\">[6]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-LichACall13_7-0\" class=\"reference\"><a href=\"#cite_note-LichACall13-7\" rel=\"external_link\">[7]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-ArndtComm00_8-0\" class=\"reference\"><a href=\"#cite_note-ArndtComm00-8\" rel=\"external_link\">[8]<\/a><\/sup>\n<\/p><p>The way in which we engage in information and knowledge seeking, organize ourselves into collectives of varying unit configurations (e.g., workgroups, project teams, taskforces, departments, divisions, networks of organizational coalitions, and consortiums), and\/or apply the use of tools or technology indicates the basic need to manage any and all forms of uncertainty.<sup id=\"rdp-ebb-cite_ref-KlingOrg93_9-0\" class=\"reference\"><a href=\"#cite_note-KlingOrg93-9\" rel=\"external_link\">[9]<\/a><\/sup> We organize ourselves in response to external and internal drivers\/stressors and increasing environmental complexity as a means of reducing or removing any impediments toward fast, reliable, and pertinent data, information, and knowledge resources.<sup id=\"rdp-ebb-cite_ref-BanduraSocial76_10-0\" class=\"reference\"><a href=\"#cite_note-BanduraSocial76-10\" rel=\"external_link\">[10]<\/a><\/sup> This imperative to organize for the sake of becoming smarter is best observed in our introduction of three primary drivers that we argue are interdependent in any knowledge environment. By describing these factors as interdependent, we are stating that as one type of driver category increases or decreases by some set of circumstances or events, corresponding changes can occur in one or both of the other areas. These areas include organizational complexity, problem\/issue complexity, and situational awareness (see Figure 2). Essentially, each of these three primary driver categories shapes our overall data, information, and knowledge strategy within any knowledge environment. The primary objective of an informatician in designing and maintaining a smart public health knowledge environment is then to understand the basic predictors of change in any or all of these categories, as well as to account for the corresponding mediation\/moderation factors that can shape continued data, information, and knowledge maximization for agents within any public health knowledge environment.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Carney_CompMathMethMed2017.png\" class=\"image wiki-link\" target=\"_blank\" data-key=\"c14c07d002513ab151c97e57a69bd0c9\"><img alt=\"Fig2 Carney CompMathMethMed2017.png\" src=\"https:\/\/www.limswiki.org\/images\/d\/d7\/Fig2_Carney_CompMathMethMed2017.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> Knowledge environment factors of influence<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Organizational_complexity_factors_shaping_public_health_knowledge_environments\">Organizational complexity factors shaping public health knowledge environments<\/span><\/h3>\n<p>We use a variety of organizational structures to facilitate interaction, communication, and knowledge representation in our quest to manage changes in our environment. Generally, the levels of organization may vary from a micro- to macrocontinuum that starts with organizational agents\/individuals, components\/sub-units, a single entity\/facility, and a multi-unit of systems\/collaborations\/coalitions\/networks\/taskforces\/consortiums.<sup id=\"rdp-ebb-cite_ref-KlingUnder05_11-0\" class=\"reference\"><a href=\"#cite_note-KlingUnder05-11\" rel=\"external_link\">[11]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-LorenziOrg95_12-0\" class=\"reference\"><a href=\"#cite_note-LorenziOrg95-12\" rel=\"external_link\">[12]<\/a><\/sup> Typically, the level of complexity inherent in the public health challenge or crisis event determines the corresponding level of organizational complexity required in the response.<sup id=\"rdp-ebb-cite_ref-LichACall13_7-1\" class=\"reference\"><a href=\"#cite_note-LichACall13-7\" rel=\"external_link\">[7]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-Thi.C3.A9tartChaos95_13-0\" class=\"reference\"><a href=\"#cite_note-Thi.C3.A9tartChaos95-13\" rel=\"external_link\">[13]<\/a><\/sup> Challenges or crisis events that are short-term or relatively minor may only require minimal, ad hoc, or temporary organizational responses. Within the modern healthcare environment, these can represent informal partnerships or formal structures appearing as short-term project teams or workgroups. More involved and long-term problems may require increasing levels of complexity within the organizational response. These long-term or complex organizational responses may be represented in the form of permanent departments or divisions within an organization, or they may even extend beyond organizational boundaries to include coalitions, collaborations, taskforces, and interagency network arrangements.\n<\/p><p>One common public health system problem-solving strategy used throughout the US and worldwide involves formulating networks of individuals and organizations to coordinate global-, national-, state-, regional-, county-, city-, or even community-level responses to health threats to individuals or populations. Such networks (e.g., coalitions, collaborations, consortiums, and taskforces) present opportunities to define common goals, shape strategy, achieve economies-of-scale through the sharing of resources and facilitate the centralized monitoring and measuring of progress toward stated objectives. However, one challenge for the public health informatics professional involves ensuring that the data, information, and knowledge needs of networks of stakeholders \u2014 ranging from patient advocates, health organizations, providers, community groups, public health departments, policy makers, and researchers \u2014 are all met with efficiency and effectiveness. The issues surrounding timely intelligence were on full display during the recent Ebola virus and Zika virus outbreaks.\n<\/p><p>Currently, there are no consistent measures or metrics to evaluate the efficiency and effectiveness of the ability of \u201csmart\u201d health networks\u2014of any size or configuration\u2014to leverage data, information, and knowledge to produce actionable intelligence from their efforts.<sup id=\"rdp-ebb-cite_ref-HsuTowards10_3-2\" class=\"reference\"><a href=\"#cite_note-HsuTowards10-3\" rel=\"external_link\">[3]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-GAOPublic10_14-0\" class=\"reference\"><a href=\"#cite_note-GAOPublic10-14\" rel=\"external_link\">[14]<\/a><\/sup> In other words, there is no quantifiable set of standardized measures or standard operational definitions of what a smart or learning health network is now or what it should be in the future.<sup id=\"rdp-ebb-cite_ref-GrossmannEngin11_15-0\" class=\"reference\"><a href=\"#cite_note-GrossmannEngin11-15\" rel=\"external_link\">[15]<\/a><\/sup> Within any public health knowledge environment, a wide variety of network structures can be assumed. The organization is viewed as a dynamic, complex, and adaptive entity whose size, structure, and other organizational determinants must be constantly evaluated to promote its ability to respond to internal and external challenges, threats, and opportunities that will impact individuals and\/or the collective leveraging of actionable intelligence to ensure success in health system management.<sup id=\"rdp-ebb-cite_ref-BurtonTheDyn11_16-0\" class=\"reference\"><a href=\"#cite_note-BurtonTheDyn11-16\" rel=\"external_link\">[16]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-NonakaADyn94_17-0\" class=\"reference\"><a href=\"#cite_note-NonakaADyn94-17\" rel=\"external_link\">[17]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-PopperOrg98_18-0\" class=\"reference\"><a href=\"#cite_note-PopperOrg98-18\" rel=\"external_link\">[18]<\/a><\/sup>\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Problem.2FIssue_complexity_factors_shaping_knowledge_environments\">Problem\/Issue complexity factors shaping knowledge environments<\/span><\/h3>\n<p>An analogy for problem\/issue identification and response within any public health knowledge environment is the human immune response in which the human immune system assesses threats on a constant basis and determines if a foreign agent is a \u201cfriend\u201d or \u201cfoe.\u201d Once identified in a healthy immune system, the proper immune response is triggered. For a friend response, facilitation\/proliferation strategies ensue, and, for a foe response, elimination\/mitigation strategies ensue. Two critical components in the overall immune response system are the ability to retain a memory of this encounter and to demonstrate system learning to prepare for future encounters of a relatively similar nature. \n<\/p><p>The same sort of dynamic occurs within a public health system network among its various organizational components, actors\/agents, and events. Once a phenomenon (i.e., circumstance\/event\/activity\/occurrence) is identified as a potential problem, either threatening or nonthreatening, system or collective memory is vetted for familiarity.<sup id=\"rdp-ebb-cite_ref-PopperOrg98_18-1\" class=\"reference\"><a href=\"#cite_note-PopperOrg98-18\" rel=\"external_link\">[18]<\/a><\/sup> If sufficient memory of the phenomenon or something similar is found, the ideal response algorithm(s) (set of instructions) is\/are identified, outlining the appropriate response mechanism. If no memory exists, a response must be determined on an ad hoc basis. Clinical or public health events\/activities that allow favorable health outcomes (e.g., diffusion of best practices, strategic summits, introduction of new technology, disease screening and awareness campaigns, and new funding announcements) may be considered targets for facilitation\/proliferation, whereas unfavorable events\/activities (e.g., disease outbreaks, health or food recalls, medical errors, deviations from guideline concordant care, risk behaviors linked to disease spread, budget shortfalls, and staff layoffs) may be targets for elimination\/mitigation. \n<\/p><p>In either case, sufficient memory must be generated of the response algorithms (process\/workflows, policies\/procedures) that contributed to the event(s), pathways toward emergence, and\/or remediation strategy to eliminate the threat. Learning in this context presents the ability to circumnavigate potentially harmful events that have the potential for recurrence or the ability to repeat\/reinforce positive events that are beneficial.<sup id=\"rdp-ebb-cite_ref-CrossanAnOrg99_19-0\" class=\"reference\"><a href=\"#cite_note-CrossanAnOrg99-19\" rel=\"external_link\">[19]<\/a><\/sup> Hence, the ability to extract actionable intelligence from stored memory is essential to overall public health system performance and an effective knowledge environment.<sup id=\"rdp-ebb-cite_ref-HsuTowards10_3-3\" class=\"reference\"><a href=\"#cite_note-HsuTowards10-3\" rel=\"external_link\">[3]<\/a><\/sup> Two factors that shape this dynamic of event, memory evaluation, and learning within a knowledge environment are familiarity and preparedness, borrowed from the field of emergency preparedness.<sup id=\"rdp-ebb-cite_ref-WarrenContr11_20-0\" class=\"reference\"><a href=\"#cite_note-WarrenContr11-20\" rel=\"external_link\">[20]<\/a><\/sup>\n<\/p><p>Within any knowledge environment, issues\/problem complexity and relative familiarity (stored memory) largely shape the level of \u201cshock\u201d or environmental stress to the public health system, which creates what Burton termed an organizational design misfit.<sup id=\"rdp-ebb-cite_ref-BurtonTheDyn11_16-1\" class=\"reference\"><a href=\"#cite_note-BurtonTheDyn11-16\" rel=\"external_link\">[16]<\/a><\/sup> In the presence of an organizational design misfit, the goal is to seek to restore some measure of equilibrium.<sup id=\"rdp-ebb-cite_ref-NonakaADyn94_17-1\" class=\"reference\"><a href=\"#cite_note-NonakaADyn94-17\" rel=\"external_link\">[17]<\/a><\/sup> The level of shock brought by the introduction of a problem\/issue into any public health knowledge environment and its corresponding impact on the public health system can be thought of in terms of two factors: (1) the degree to which the event was expected to occur and (2) the degree to which the environment was prepared for its occurrence. Figure 3 highlights the relationships of these two factors, where the green represents a highly desirable state of system and organizational readiness (operationally defined here as the agents\u2019 \u2014 within the public health knowledge environment \u2014 ability to process the event and determine an appropriate response), yellow represents less desirable states of organizational readiness, and red represents the least desirable state of organizational readiness and the highest level of vulnerability from both internal and external threats.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Carney_CompMathMethMed2017.png\" class=\"image wiki-link\" target=\"_blank\" data-key=\"15a1b9aeb166e9fed0d265f2ff14d785\"><img alt=\"Fig3 Carney CompMathMethMed2017.png\" src=\"https:\/\/www.limswiki.org\/images\/6\/64\/Fig3_Carney_CompMathMethMed2017.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 3.<\/b> Problem\/issue complexity factors<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Although most public health systems are prepared to deal with any event, some noticeable changes can occur in the face of uncovered vulnerabilities introduced by shock events. Such adjustments on the organizational side may present as unexpected leadership shifts, sudden changes in organizational command structures, abrupt shifts in policy and procedures, new strata of research funding to investigate and solve problems, or the addition or elimination of staff and key personnel.<sup id=\"rdp-ebb-cite_ref-BurtonTheDyn11_16-2\" class=\"reference\"><a href=\"#cite_note-BurtonTheDyn11-16\" rel=\"external_link\">[16]<\/a><\/sup> On the public health knowledge environment side, such adjustments can take the form of wide-scale data integration or health information exchange efforts, the formation of new database solutions, the demand for new technology to monitor and track the problem, surveillance protocols, information systems, knowledge portals, decision-support systems, and changes in information resource-management protocols.<sup id=\"rdp-ebb-cite_ref-LorenziOrg95_12-1\" class=\"reference\"><a href=\"#cite_note-LorenziOrg95-12\" rel=\"external_link\">[12]<\/a><\/sup> The level of complexity in both the problem\/issue and the capability of the public health knowledge environment to process the event and mount an appropriate response heavily shapes the level of organization (or in some cases reorganization) required to mediate the threat or exploit the opportunity. Additionally, these changes\u2014and more importantly the rate of changes in the organization in particular and the public health knowledge environment in general\u2014may serve as proxy indicators for overall public health knowledge environment maturity in managing uncertainty. In other words, a health system or public health agency that has undergone frequent leadership changes, high staff turnover, frequent redrafting of strategic plans, and reorganizations in a relatively short span of time serves as a strong indicator of the lack of overall public health knowledge environment maturity.<sup id=\"rdp-ebb-cite_ref-LorenziOrg95_12-2\" class=\"reference\"><a href=\"#cite_note-LorenziOrg95-12\" rel=\"external_link\">[12]<\/a><\/sup> Such a public health knowledge environment characteristically remains in a loop moving from crisis-to-solution to a new or reemerging crisis-to-solution. In contrast, a mature public health knowledge environment will seek to identify and understand the patterns of organizational complexity and problem\/issue complexity emergence and response. Properly stored, organized, and readily accessible system memory can greatly aid in achieving a more mature public health knowledge environment.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Situational_awareness_factors_shaping_information_environments\">Situational awareness factors shaping information environments<\/span><\/h3>\n<p>Previously, we stated that organizational complexity is shaped by external or internal factors in a given public health knowledge environment, requiring different levels of formal or informal organizational structures to manage their environmental challenges. We also mentioned that the level of complexity inherent in problems\/issues and the corresponding system memory and preparedness will shape system-level responses to control and mitigate any perceived threats. Here, we formally define the term \"situational awareness\" (SA) as \u201cthe ability to make sense of an ambiguous situation. It is the process of creating [situational awareness] and understanding to support decision-making under uncertainty \u2014 an effort to understand connections among people, places, and events in order to anticipate their trajectories and act effectively.\u201d<sup id=\"rdp-ebb-cite_ref-KleinMaking06_21-0\" class=\"reference\"><a href=\"#cite_note-KleinMaking06-21\" rel=\"external_link\">[21]<\/a><\/sup> Endsley elaborated on the definition for SA, stating that it's comprised of three sub-domains that shape individual understanding of some phenomena. These include (1) situation perception (defining the current public health condition), (2) situation comprehension (defining the relative public health threat or opportunity), and (3) situation projection (forecasting the public health outcomes of hypothesized trajectories).<sup id=\"rdp-ebb-cite_ref-EndsleyDesign11_22-0\" class=\"reference\"><a href=\"#cite_note-EndsleyDesign11-22\" rel=\"external_link\">[22]<\/a><\/sup>\n<\/p><p>Within situational awareness, the two elements of organizational complexity and problem\/issue complexity are combined and serve as contributing factors that determine the degree to which organizational structure and function are properly suited to facilitate unencumbered information processing. Previous organizational theories have described the organization, functioning within a given environment, as an information-processing entity (IPE).<sup id=\"rdp-ebb-cite_ref-BurtonTheDyn11_16-3\" class=\"reference\"><a href=\"#cite_note-BurtonTheDyn11-16\" rel=\"external_link\">[16]<\/a><\/sup> From this perspective, organizations are seen as sophisticated information processing and decision-making machines that act as if they have pre-programmed subroutines in managing the loop stages of information flow and organizational processes (model \u2192 input \u2192 transformation \u2192 output \u2192 feedback).<sup id=\"rdp-ebb-cite_ref-MarchOrg93_4-1\" class=\"reference\"><a href=\"#cite_note-MarchOrg93-4\" rel=\"external_link\">[4]<\/a><\/sup> Within the IPE view of an organization, we must understand information processing as a means of shaping organizational and individual decisions, behaviors, and communication patterns.<sup id=\"rdp-ebb-cite_ref-PopperOrg98_18-2\" class=\"reference\"><a href=\"#cite_note-PopperOrg98-18\" rel=\"external_link\">[18]<\/a><\/sup> The flow of information and knowledge is codependent on our constant need to learn and share knowledge, largely shaping the structure of our social and organizational network arrangements.<sup id=\"rdp-ebb-cite_ref-BowenOnt12_23-0\" class=\"reference\"><a href=\"#cite_note-BowenOnt12-23\" rel=\"external_link\">[23]<\/a><\/sup> Therefore, the need to know or cognitive demand of both individuals and organizations becomes a primary driver of IPE activity.<sup id=\"rdp-ebb-cite_ref-PopperOrg98_18-3\" class=\"reference\"><a href=\"#cite_note-PopperOrg98-18\" rel=\"external_link\">[18]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-BowenOnt12_23-1\" class=\"reference\"><a href=\"#cite_note-BowenOnt12-23\" rel=\"external_link\">[23]<\/a><\/sup> \n<\/p><p>Information processing for the sake of storing mountains of data, information, and knowledge resources as an end itself is meaningless in the context of efficiency, effectiveness, and viability in meeting public health system organizational missions, goals, and objectives. More precisely, the primary function of any level of IPE \u2014 from simple department units to complex multiorganizational networks or health information exchanges \u2014 is to respond to what agents\/actors need to know, when they need to know it, and to support the choices\/decisions that must be made as shareholders navigate through the health system, defined here as actionable knowledge or intelligence.<sup id=\"rdp-ebb-cite_ref-HsuTowards10_3-4\" class=\"reference\"><a href=\"#cite_note-HsuTowards10-3\" rel=\"external_link\">[3]<\/a><\/sup> The public health organizational IPE will seek to leverage SA to maximize readiness to meet public health threats from the environment and to maximize public health knowledge environment agent\/actor individual and\/or collective intelligence in the performance of core public health tasks and functions. Therefore, public health-centric SA serves as a comprehensive measure of public health system smartness and is essential for any standardized assessment of public health system performance from a public health informatics perspective.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Smart_systems_vulnerability_index\">Smart systems vulnerability index<\/span><\/h2>\n<p>In this section, we propose seven smart health system measures and capabilities appropriate for helping to manage public health organizational complexity, problem\/issue\/complexity, and situational awareness for public health systems networks and public health knowledge environments. Figure 4 lists these seven measures and provides a brief description of a smart public health system and our rationale for its use. Although other measures may be available in the literature, we believe these seven effectively capture the key concepts discussed above. Of course, public health informatics professionals may need to use discretion when applying measures based on the context, the purpose of measurement, and any constraints hindering measurement.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig4_Carney_CompMathMethMed2017.png\" class=\"image wiki-link\" target=\"_blank\" data-key=\"0b38bb47718c1b065e2dc68868930aff\"><img alt=\"Fig4 Carney CompMathMethMed2017.png\" src=\"https:\/\/www.limswiki.org\/images\/6\/6c\/Fig4_Carney_CompMathMethMed2017.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 4.<\/b> Smart health systems measures and capabilities<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Knowledge_discovery_rate_.28KDR.29\">Knowledge discovery rate (KDR)<\/span><\/h3>\n<p>Thomas Davenport described knowledge as data and information imbued with meaning and relevance. In this way, knowledge is seen as a continued aggregation and refinement that begins with raw data elements. For example, a set of 10 numerical digits constitutes raw and meaningless data. At the stage of information, it can be recognized as a telephone number.\n<\/p><p>These same digits can be viewed as knowledge when that number is contextualized as a conduit to satisfy some individual or organizational cognitive demand to support health decision-making or address some issue or problem. This telephone number can be viewed as a source of knowledge when, for example, it represents a nurse hotline for patient navigation. The informatics professional managing\/building a mature public health knowledge environment works with stakeholders to develop a comprehensive knowledge inventory (formerly referred to as an ontology) of all products used to inform public health stakeholder decision-making.<sup id=\"rdp-ebb-cite_ref-BowenOnt12_23-2\" class=\"reference\"><a href=\"#cite_note-BowenOnt12-23\" rel=\"external_link\">[23]<\/a><\/sup>\n<\/p><p>The knowledge discovery rate (KDR) represents the rate at which knowledge is generated from new or existing data and information resources. In other words, the KDR shows how long it may take someone to (1) realize these 10 digits represent a telephone number, (2) process that this telephone number is connected to a patient navigating service, and (3) realize the nursing navigation service has additional resources to maximize the care experience. In public health, KDR may be expressed as the time it takes health officials to recognize a pattern in seemingly unrelated health events (e.g., ER traffic, school\/work absence, provider case reporting, news reports, food plant inspection reports, and grocery store sales), indicating a disease threat in the form of a potential food-borne illness. It should be noted that display or interface is an essential component as well. Endsley explores how the interface and display of information can greatly deter the uptake of knowledge and consequently impair overall situational awareness.<sup id=\"rdp-ebb-cite_ref-EndsleyDesign11_22-1\" class=\"reference\"><a href=\"#cite_note-EndsleyDesign11-22\" rel=\"external_link\">[22]<\/a><\/sup> For example, will everyone read these representations in the same manner \u2014 \u201c1234567890\u201d and \u201c123.456.7890\u201d and \u201c(123) 456-7890\u201d \u2014 in their communication exchange?\n<\/p><p>A key component in shaping the rate of knowledge discovery involves comprehensively assessing the presentation and display of data, information, and knowledge throughout key stages of any healthcare delivery (e.g., clinical pathway) or public health process. To that end, KDR involves understanding how knowledge is packaged for consumption in the form or paper or electronic tangible (explicit) knowledge products or as less tangible (tacit) knowledge products to inform decision-making. This measure examines the production curve of knowledge from generation, presentation, selection, and consumption, as well as the qualitative assessment of knowledge\u2019s relevance to agent-specific choice.\n<\/p><p>KDR may be particularly pertinent in settings in which pattern recognition depends on a coordination of data, information, and knowledge from a highly heterogeneous network of sources and stakeholders. KDR becomes essential in public health situations where intelligence has to be collated across multiple agencies (e.g., school, hospital, retail, and corporate), wide geographic boundaries (e.g., multiple regional metropolitan area health departments), or multiple categories of stakeholders (e.g., patients, providers, and health administrators). In such cases, KDR represents a measures of timeliness and operates as a key indicator of public health outcomes.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Organizational_.28agent-_or_systems-level.29_memory\">Organizational (agent- or systems-level) memory<\/span><\/h3>\n<p>Earlier we described the matrix of system shock as a function of preparedness and expectedness in response to internal and external stimuli (events). Organizational or systems-level memory can be described as the degree to which the history of these encounters, responses, and the relative degree of success or failure of those responses are catalogued and stored for future use by other agents\/actors in the future. This can be operationally understood as the \u201crepeatability\u201d level, commonly referred to as level two of five in the capability maturity model (CMM).<sup id=\"rdp-ebb-cite_ref-DymondAGuide95_24-0\" class=\"reference\"><a href=\"#cite_note-DymondAGuide95-24\" rel=\"external_link\">[24]<\/a><\/sup> Systems memory simply asks to what degree are phenomena captured and labeled as favorable or unfavorable and response algorithms developed and made available for expedient consumption by the same and\/or other agents within the knowledge environment. A lack of repeatability represents a high level of unnecessary \u201cad hoc\u201d or CMM level-one responses<sup id=\"rdp-ebb-cite_ref-DymondAGuide95_24-1\" class=\"reference\"><a href=\"#cite_note-DymondAGuide95-24\" rel=\"external_link\">[24]<\/a><\/sup> and may result in an inordinately high level of shock to the system for events that if properly catalogued could have been relegated to the realm of routine with minimal system-shock value.\n<\/p><p>Here, the primary measure is to determine the level of completeness, sophistication, and use of knowledge-bases that represent the sum of public health knowledge stored for current and future public health decision-making. This can be expressed as basic knowledge inventories, resource guides, policy and procedure manuals, and intranet\/Internet lessons or best practices. It can also be expressed as highly sophisticated knowledge ontologies that capture and display public health knowledge, tasks, events, and procedures in complex electronic tools to support network modeling, information flows, and critical communication pathways. Public health knowledge portals can be constructed to identify public health stakeholder query demand more easily, as well as access, retrieve, display, and analyze knowledge use in any public health knowledge environment. This capability is essential to the proper use and maximization of organizational memory. In the absence of standardized knowledge memory management, a public health organization remains in a perpetual <i>ad hoc<\/i> response mode to each new or reoccurring public health crisis.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Agent-specific_and_system_learning\">Agent-specific and system learning<\/span><\/h3>\n<p>There is a growing body of literature of the evolution of a \u201clearning health system.\u201d<sup id=\"rdp-ebb-cite_ref-FriedmanAchiev10_25-0\" class=\"reference\"><a href=\"#cite_note-FriedmanAchiev10-25\" rel=\"external_link\">[25]<\/a><\/sup> Our study contributes to this concept by providing a conceptual definition of both agent-specific and system-level learning from the perspective of a public health informatics professional managing\/building a public health knowledge environment. Here, learning is understood as the wisdom level of the informatics continuum.<sup id=\"rdp-ebb-cite_ref-SmithInfo12_26-0\" class=\"reference\"><a href=\"#cite_note-SmithInfo12-26\" rel=\"external_link\">[26]<\/a><\/sup> We have refrained from using this concept throughout this discussion, but at this point it is appropriate to recognize that some informatics literature describes the informatics continuum (earlier referred to as the data progression) as data to information to knowledge to wisdom.<sup id=\"rdp-ebb-cite_ref-SmithInfo12_26-1\" class=\"reference\"><a href=\"#cite_note-SmithInfo12-26\" rel=\"external_link\">[26]<\/a><\/sup> Typically, finding objective measures of wisdom is not easy or universally accepted. However, we have chosen to substitute wisdom for decision and outcomes. As a result, our data progression extends to the following sequence: data to information to knowledge to decisions to outcomes. The difference is that choices, when properly linked to specific outcomes and their corresponding consequences, provide opportunities for learning. As a result of this substitution, we are now able to define the concept of wisdom operationally as the degree to which choice \u2014 informed by relevant knowledge products \u2014 can lead to more highly desirable decisions, beneficial outcomes, and positive consequences for the overall health and well-being of agents and the system.\n<\/p><p>We extend our definition of wisdom to incorporate intelligence, simply understood as the display of wisdom over time. In our model, learning acts as a measure of differential wisdom and intelligence over time (the difference measured at two distinct points in time). In other words, this equation involves individual or organizational wisdom displayed or measured at some endpoint (<i>t<\/i>2) minus the individual or organizational wisdom displayed at some starting point (<i>t<\/i>1). The organizational IQ in a learning health system is then understood as the measure of differential wisdom displayed over time toward some set of decisions\/choices, actions\/tasks, or other health phenomena. Learning represents a measurement of agent-specific or system-level discernment (the ability to leverage situational awareness in comprehending threat level, as well as leveraging stored or new knowledge in choosing between differing options). To this end, learning is construed as the means of refinement in the art of discernment or wisdom acquisition.\n<\/p><p>In public health terms, the operational construct of this measure of learning is still evolving, and little literature exists on applying this construct in public health practice. We suggest that measures of learning\u2014presented here as a means of leveraging knowledge resources in a wise manner\u2014are largely dependent on the previous measures of organizational memory. In the absence of a well-designed public health knowledge-base that captures history or practice, learning from such experience becomes extremely episodic and anecdotal in nature. For example, we can only speculate on how much stored memory has been gathered with respect to the Ebola crisis that may more easily mitigate another outbreak or similar outbreaks of other diseases in related conditions. The emergence of rapid learning health networks deals with some aspects of this challenge by streamlining the processing of research evidence into practice and gathering knowledge stores of what works best in achieving better health outcomes. However, global implementations of these research-to-practice and comparative effectiveness networks are still in the early stages of development.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Knowledge_absorption_rate_.28KAR.29\">Knowledge absorption rate (KAR)<\/span><\/h3>\n<p>Carley <i>et al.<\/i> described how the sum of knowledge within a given system boundary can be quantified in terms of knowledge bits.<sup id=\"rdp-ebb-cite_ref-CarleyAdapt98_27-0\" class=\"reference\"><a href=\"#cite_note-CarleyAdapt98-27\" rel=\"external_link\">[27]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-KrackhardtAPCANS98_28-0\" class=\"reference\"><a href=\"#cite_note-KrackhardtAPCANS98-28\" rel=\"external_link\">[28]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-SchreiberConstruct04_29-0\" class=\"reference\"><a href=\"#cite_note-SchreiberConstruct04-29\" rel=\"external_link\">[29]<\/a><\/sup> According to this concept, knowledge represented in its various forms can be deconstructed into quantifiable units.<sup id=\"rdp-ebb-cite_ref-SchreiberConstruct04_29-1\" class=\"reference\"><a href=\"#cite_note-SchreiberConstruct04-29\" rel=\"external_link\">[29]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-HirshmanSpec07_30-0\" class=\"reference\"><a href=\"#cite_note-HirshmanSpec07-30\" rel=\"external_link\">[30]<\/a><\/sup> The number of knowledge units or bits that may comprise a discreet package of knowledge is determined by the level of complexity of the decisions or tasks this knowledge is designed to inform.<sup id=\"rdp-ebb-cite_ref-SchreiberConstruct04_29-2\" class=\"reference\"><a href=\"#cite_note-SchreiberConstruct04-29\" rel=\"external_link\">[29]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-HirshmanSpec07_30-1\" class=\"reference\"><a href=\"#cite_note-HirshmanSpec07-30\" rel=\"external_link\">[30]<\/a><\/sup> As such, a direct relationship exists between the number of knowledge bits and the level of the complexity in related decisions and tasks. The greater the level of task or decision complexity\/criticality or decision, the larger the knowledge complement (or number of knowledge bits) associated with the management, storage, display\/representation, diffusion, use, and comprehension of knowledge.<sup id=\"rdp-ebb-cite_ref-SchreiberConstruct04_29-3\" class=\"reference\"><a href=\"#cite_note-SchreiberConstruct04-29\" rel=\"external_link\">[29]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-HirshmanSpec07_30-2\" class=\"reference\"><a href=\"#cite_note-HirshmanSpec07-30\" rel=\"external_link\">[30]<\/a><\/sup> This perspective assumes more knowledge bits are needed to saturate or carry out a complex task or make a critical choice than to implement a more simplified\/less complex task or choice.\n<\/p><p>In essence, this concept of knowledge bits suggests that throughout the knowledge environment, agents\/actors can have either no saturation of knowledge (or 0 bits) up to 100 percent saturation of knowledge (or all bits available in the knowledge environment). The consumption rate or absorption of these knowledge bits over time, in the performance of core task performance, can then be evaluated using a variety of statistical and computational modeling methods. To carry this out, a value or weight is assigned to every piece of knowledge represented within a knowledge inventory (also referred to as ontology). The weight given to a knowledge product represents both the degree of value assigned by consumers of that knowledge product (elasticity-of-demand) and the magnitude of importance of the respective decision(s) it is intended to inform (criticality). The curve of a knowledge product\u2019s elasticity-of-demand and criticality of decisions is evaluated in the context of a core set of tasks to be performed at the agent or system level.\n<\/p><p>In public health settings, KAR represents a concrete way of measuring overall application of knowledge to performance. In previous studies, we examined the knowledge absorption rate of community health clinical staff with regard to breast, cervical, and colorectal cancer screening policies, guidelines, and protocols as derived from the use of electronic <a href=\"https:\/\/www.limswiki.org\/index.php\/Clinical_decision_support_system\" title=\"Clinical decision support system\" target=\"_blank\" class=\"wiki-link\" data-key=\"095141425468d057aa977016869ca37d\">clinical decision-support<\/a> (CDS).<sup id=\"rdp-ebb-cite_ref-CarneyUsing14_31-0\" class=\"reference\"><a href=\"#cite_note-CarneyUsing14-31\" rel=\"external_link\">[31]<\/a><\/sup> We examined the extent to which CDS use and corresponding knowledge absorption rates would be correlated to organizational performance for cancer screening.<sup id=\"rdp-ebb-cite_ref-CarneyUsing14_31-1\" class=\"reference\"><a href=\"#cite_note-CarneyUsing14-31\" rel=\"external_link\">[31]<\/a><\/sup> We demonstrated that KAR was, in fact, a predictor of organizational performance in meeting process-of-care outcomes in cancer care.<sup id=\"rdp-ebb-cite_ref-CarneyUsing14_31-2\" class=\"reference\"><a href=\"#cite_note-CarneyUsing14-31\" rel=\"external_link\">[31]<\/a><\/sup> Hence, we suggest that KAR can serve as an effective measure of HIT impact on performance by focusing on end-users\u2019 ability to access key knowledge by interacting with HIT tools and applying this knowledge to healthcare delivery and public health practice.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Agent-specific_and_system-level_cognitive_demand\">Agent-specific and system-level cognitive demand<\/span><\/h3>\n<p>Within any given knowledge environment, agents\/actors at many levels perform key tasks, make decisions, and engage in a series of activities that can be described by a set of process algorithms.<sup id=\"rdp-ebb-cite_ref-KrackhardtAPCANS98_28-1\" class=\"reference\"><a href=\"#cite_note-KrackhardtAPCANS98-28\" rel=\"external_link\">[28]<\/a><\/sup> The constant factor governing this activity is the principle of a cognitive demand for information. The principle of supply and demand borrowed from the field of economics applies somewhat to the field of informatics with respect to agents'\/actors' need for information to support decision-making and task performance. Here, we focus on the metric as a measure of the relative demand for data, information, and knowledge resources by agents\/actors operating at all levels of the multilevel model, as well as the corresponding supply of data, information, and knowledge resources available for consumption. We refer to main driver of the interplay between the supply and demand of data, information, and knowledge resources as the cognitive demand or simply the \u201cneed to know.\u201d\n<\/p><p>This \u201cneed to know\u201d or cognitive demand shapes information-seeking behaviors of the agents\/actors within the system and may govern the amount effort they are willing to expend in acquiring the data, information, or knowledge resources. The level of importance or criticality of information to the agent is measured by the elasticity-of-demand (a borrowed term) for that information. The measure of elasticity coupled with the relative supply of information can be used to measure relative states of \u201cinformedness\u201d of the agents\/actors within the system. According to the formal definition of elasticity, in an elastic demand, the change in quantity demanded due to a change in price is large.<sup id=\"rdp-ebb-cite_ref-PEDDef_32-0\" class=\"reference\"><a href=\"#cite_note-PEDDef-32\" rel=\"external_link\">[32]<\/a><\/sup> In contrast, an inelastic demand is one in which the change in quantity demanded due to a change in price is small.<sup id=\"rdp-ebb-cite_ref-PEDDef_32-1\" class=\"reference\"><a href=\"#cite_note-PEDDef-32\" rel=\"external_link\">[32]<\/a><\/sup> Cognitive demand can serve as a core measure in identifying knowledge-related vulnerabilities within a system or the relative degree to which the cost of the knowledge required is acceptable or not acceptable.\n<\/p><p>We understand that in the context of health systems, the concept of price with respect to knowledge can be measured in terms of access, affordability (time and effort), overall opportunity-cost (ease-of-use, processing, comprehension, and understanding), and relevancy. Figure 5 lists that four distinct states agents\/actors can assume within any public health knowledge environment based on the level of criticality (elasticity-of-demand) and the supply of knowledge. When the cognitive demand for knowledge is highly critical and the relative supply is limited, knowledge gaps emerge. Such knowledge gaps may result from a variety of scenarios, including (1) the information or knowledge product does not exist, resulting in a need for innovation; (2) the resource exists, but access is in some way limited or encumbered; (3) the resource exists with abundant access but is not easily processed or consumed because of literacy challenges, content presentation, or other reasons; and (4) the supply is challenged by other competing priorities and is intentionally undeveloped or underdeveloped. The two states we refer to as parity conditions represent areas where the level of criticality is adequately met by the level of knowledge supply. In such cases, the main strategy is to employ continuous monitoring to ensure balances remain within desired ranges of acceptability in conjunction with the need for balance in the overall public knowledge environment. A state of knowledge surplus results when the level of information or knowledge product supply exceeds the relative level of importance placed on the information or knowledge products (also understood as relevancy). This state represents an opportunity for the elimination of outmoded or underused knowledge resources, information system redesign\/upgrade, or other information technology strategic efforts to ensure long-term relevance of information resources, information systems, and knowledge products.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig5_Carney_CompMathMethMed2017.png\" class=\"image wiki-link\" target=\"_blank\" data-key=\"9925f0fcbf49c2cc79014c5bcafc24c5\"><img alt=\"Fig5 Carney CompMathMethMed2017.png\" src=\"https:\/\/www.limswiki.org\/images\/6\/66\/Fig5_Carney_CompMathMethMed2017.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 5.<\/b> Criticality (elasticity-of-demand) and the supply of knowledge<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The public health application of this measure of relative cognitive demand or the need to know is illustrated by the recent Zika outbreak. It became evident that the population at greatest risk for the disease was pregnant or soon-to-be pregnant women. The demand for information regarding protective measures, travel restrictions, the rate of transmission, the relative threat to fetal health, and signs and symptoms once infected created large pockets of health consumer uncertainty, stress, and anxiety. Given the severe level of risk to pregnant women and their developing babies, the demand for knowledge of what to do for protection was highly critical. The window of transmission of the Zika virus from mother to fetus was highly uncertain, the effectiveness of preventive measures was hard to measure, and travel decisions to affected areas were rather unclear, resulting in conditions of highly critical and low-resourced knowledge profiles for many health consumers and stakeholders at all levels. Rapid research was needed to identify proven measures against the invading mosquito population. Public health departments throughout the affected areas were scrambling to model the spread of the disease, measure the impact of the preventive measures, and manage the reports of news cases. Meanwhile, the public was constantly demanding new answers and updates on a daily basis. This was compounded by the timing of the 2016 Summer Olympic games in Rio that sparked highly publicized athletes refusing to travel to the region to participate in the event. Highly critical\/low supply-resourced conditions are probably the most difficult to manage. In any public health knowledge environment, a continual assessment of stakeholder cognitive demand must be done\u2014relative to the capabilities of the existing or evolving knowledge-base\u2014as a means of satisfying current and\/or projecting anticipated demands.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Cognitive_mapping\">Cognitive mapping<\/span><\/h3>\n<p>Once the knowledge inventory and relative measures of importance are assessed and the corresponding process and information flows have been identified, the public health informatics professional can now engage in the process of creating cognitive maps or models of both existing and emerging knowledge and communication pathways. These pathways can be modeled for specific agents, for the system as a whole, or any combination of the two. Here, the public health informatics professional is not simply asking who uses what information or examining the use of computerized information resources; instead, the goal involves trying to model the cycle of information and knowledge within the public health knowledge environment. This information cycle is best understood as starting with raw materials, in this case raw and at times unformatted data elements, which are assembled into chunks of information (e.g., electronic databases or information systems). These information chunks are either coordinated in the formation of meaningful knowledge products or presented to users of information to coordinate based on their specific needs (structured queries), which can be thought of as off-the-shelf knowledge products or <i>ad hoc<\/i> user-defined knowledge products to support decision-making (<i>ad hoc<\/i> queries). We refer to this cycle as knowledge refinery.\n<\/p><p>Analytic measures of knowledge refinement consist of examining the pace of knowledge development and exploring system responsiveness as expressed by the supply of and demand for data, information, and knowledge resources.<sup id=\"rdp-ebb-cite_ref-GolanInfo08_33-0\" class=\"reference\"><a href=\"#cite_note-GolanInfo08-33\" rel=\"external_link\">[33]<\/a><\/sup> The basic elements of analysis consist of the total knowledge in any given public health knowledge environment (knowledge entropy) relative to the amounts of used kinetic knowledge and unused potential knowledge.<sup id=\"rdp-ebb-cite_ref-GolanInfo08_33-1\" class=\"reference\"><a href=\"#cite_note-GolanInfo08-33\" rel=\"external_link\">[33]<\/a><\/sup> This can also be expressed in terms of the amount of knowledge\/information gained or loss in an effort to maximize performance.<sup id=\"rdp-ebb-cite_ref-GolanInfo08_33-2\" class=\"reference\"><a href=\"#cite_note-GolanInfo08-33\" rel=\"external_link\">[33]<\/a><\/sup> Additionally, the public health informatics professional could examine existing and emerging pathways that are developed through the examination of patterns of use, which is closely linked to the concept of plasticity.<sup id=\"rdp-ebb-cite_ref-ScarboroughNeural07_34-0\" class=\"reference\"><a href=\"#cite_note-ScarboroughNeural07-34\" rel=\"external_link\">[34]<\/a><\/sup> In the field of neuroscience, the term \"neuroplasticity\" refers to the human brain\u2019s ability to change in response to behavioral, environmental, and neural processes.<sup id=\"rdp-ebb-cite_ref-MNNeuro_35-0\" class=\"reference\"><a href=\"#cite_note-MNNeuro-35\" rel=\"external_link\">[35]<\/a><\/sup> In the human brain, these pathways, after repeated stimulation and reinforcement, are actually carved into the brain tissue.<sup id=\"rdp-ebb-cite_ref-MNNeuro_35-1\" class=\"reference\"><a href=\"#cite_note-MNNeuro-35\" rel=\"external_link\">[35]<\/a><\/sup> Like neural pathways are carved into the human brain, IPEs, as described earlier, may examine how public health knowledge environments respond to changes in behavior, environmental conditions, or agent-specific or system-level cognitive demands.<sup id=\"rdp-ebb-cite_ref-ScarboroughNeural07_34-1\" class=\"reference\"><a href=\"#cite_note-ScarboroughNeural07-34\" rel=\"external_link\">[34]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-DARPA_36-0\" class=\"reference\"><a href=\"#cite_note-DARPA-36\" rel=\"external_link\">[36]<\/a><\/sup>\n<\/p><p>Pathways of changes in public health knowledge environment cognition can be modeled using a variety of conceptual and visualization techniques.<sup id=\"rdp-ebb-cite_ref-HammadComp98_37-0\" class=\"reference\"><a href=\"#cite_note-HammadComp98-37\" rel=\"external_link\">[37]<\/a><\/sup> Such pathways, when observed and modeled, can yield repeated patterns, which may be canonized as permanent or semi-permanent cognitive pathways toward system-level knowledge and learning health systems.<sup id=\"rdp-ebb-cite_ref-HammadComp98_37-1\" class=\"reference\"><a href=\"#cite_note-HammadComp98-37\" rel=\"external_link\">[37]<\/a><\/sup> Within our model of public health knowledge environments, highly intelligent health systems have the ability to manage such cognitive pathways in response to cognitive demands.<sup id=\"rdp-ebb-cite_ref-HammadComp98_37-2\" class=\"reference\"><a href=\"#cite_note-HammadComp98-37\" rel=\"external_link\">[37]<\/a><\/sup> Where old or unused pathways exist, data and information systems (and the corresponding knowledge products) will likely be considered outdated or not useful. Where current cognitive pathways are robust and frequented, data and information systems are likely to be considered essential to decision-making, and where new and emerging cognitive pathways are observed or predicted, the likelihood exists for innovation and systems development to support emerging communities of practice, workgroups, department\/divisions, and other formal or informal organizational structures.<sup id=\"rdp-ebb-cite_ref-CarleySmart02_38-0\" class=\"reference\"><a href=\"#cite_note-CarleySmart02-38\" rel=\"external_link\">[38]<\/a><\/sup>\n<\/p><p>The public health application lies in understanding the public health system as an evolving complex network of individuals, organizations, groups, and knowledge resources. Here, the public health informatics professional may find knowledge, skills, and abilities in modeling social networks and organizational networks that are essential in establishing current state network diagrams (baseline) and future state diagrams designed to guide the visualization of a public health knowledge environment. In this context, a large library of network measures can be employed to support the analysis of a public health system and its respective knowledge environment, like measures of network density agents, closeness\/connectedness of agents to each other or to other knowledge resources, patterns of clustering and cliques behaviors, knowledge-sharing practices, and more. The application of social and organizational network analysis in public health is growing at a rapid rate and is expected to continue moving forward.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Aberrant_detection_analytics\">Aberrant detection analytics<\/span><\/h3>\n<p>Arguably the most important analysis within this discussion involves being able to detect subtle changes within the public health knowledge environment that may pose a threat to one or more agents or the system overall. Here, we discuss the ability of intelligent analytics or telemetry as part of a Public Health Situation Room that can be used to detect subtle changes within the public health knowledge environment. The public health informatics professional relies heavily on the use of probes and sensors as part of any surveillance and monitoring system to gather intelligence. Similarly, physicians and nurses rely on telemetry to monitor patient vital signs, drivers use dashboards to detect changes in automobile status, and investors use tickers to track global investments. These forms of monitoring and tracking systems have one feature in common: they all make use of a grid system or network of core indicators as validated predictors of overall system health.\n<\/p><p>Any sensor network used to monitor and track activity within a given public health Public Health Situation Room must recognize several information-specific concepts. First, information does not always travel along predefined organizational departmental or process pathways. Instead, information exchange may occur along a multiplicity of pathways, some predictable and others highly unpredictable. While organizational constructs of departments and divisions may account for some of communication and information exchange, they do not account for all activity. Therefore, the placement of public health data\/event collecting sensors within a given public health knowledge environment must be a fluid network that is highly adaptive and capable of capturing activity in different settings, wherever the information channels may lead.\n<\/p><p>Second, the Public Health Situation Room sensor grid must be able to identify and track activity by both internal agents and system components, as well as by external agents and system components that may interact with the environment. No public health knowledge environment is completely closed. As a result, sensors must capture intelligence from portals through which information travels in and out of the system in all forms. Finally, the level of completeness must be defined to determine what represents an adequate level of coverage. A poorly designed or partial Public Health Situation Room sensor grid that allows large levels of undetected activity would not be useful on long-term. A sensor grid should be viewed as a living and growing network of data\/event collection activities that changes with evolving needs and priorities, and the level of granularity or specificity of detection must also be capable of changing within the grid as strategic priorities shift. The full list of indicators drawn from public health knowledge environment factors, stakeholder-levels, and unique views will largely shape the types of sensors, density of the network, and level of sensitivity needed for meaningful aberrant detection algorithms and monitoring system development.\n<\/p><p>Public Health Situation Rooms are used at all levels of the public health system throughout national and international health settings, the <a href=\"https:\/\/www.limswiki.org\/index.php\/United_States_Department_of_Health_and_Human_Services\" title=\"United States Department of Health and Human Services\" target=\"_blank\" class=\"wiki-link\" data-key=\"efa106bcbb93039b1a6c3c596daedec3\">U.S. Department of Health and Human Services<\/a>, various public health agencies, and healthcare delivery settings across the globe. However, there is no standardized model for this type of monitoring capability. Public Health Situation Room needs and priorities vary widely by organization and may include but not be limited to disease management, outbreaks investigation, emergency preparedness, disaster response, community health assessments, and even healthcare access, equity, and quality. The use of electronic performance, strategic, operational, and clinical dashboards are typical of such Public Health Situation Rooms. We argue the primary challenge of the public health informatics professional in the design and execution of Public Health Situation Rooms is to develop the underlying smart infrastructure (knowledge-base) and array of analytic measures described in this discussion to ensure the maximum impact on desired outcomes.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusion\">Conclusion<\/span><\/h2>\n<p>As we enter a public health informatics era with terms like \"learning health systems,\" \"smart health systems,\" and \"adaptive complex health systems,\" we must identify a common set of analytic measures and capabilities to inform our modeling, measuring, and managing of public health \"smartness.\" Such a set of measures must take into account the full spectrum of sociotechnical factors that make up a public health system and shape performance, including technical, organizational, and human contributions. It is essential that we understand the basic drivers of smart systems, expressed in this discussion as simply the need to know or cognitive demand. This basic need to know and our corresponding effort to leverage data, information, and knowledge resources toward some individual or collective set of goals and objectives form the basic parameters of any smart system. In the context of a public health system, public health informatics professionals stand poised to redefine the benefit of smarter healthcare delivery and public health practice. A common set of analytic measures and capabilities that can drive efficiency and viable models can demonstrate how incremental changes in smartness generate corresponding changes in public health performance. Here, we introduced the concepts of organizational complexity, problem\/issue complexity, and situational awareness as three codependent drivers of smart public health systems characteristics. We also propose seven smart health systems measures and capabilities that are considered essential in a public health informatics professional's toolkit. Because this area of research and practice is still in its formative stages, the intent of this discussion is to build on the developing body of literature seeking to establish standardized measures for smart, learning, and adaptive public health systems.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Disclosure\">Disclosure<\/span><\/h2>\n<p>This work represents the opinion of the author and cannot be construed to represent the opinion of the U.S. Federal Government. Timothy Jay Carney is the founding partner of the Global Health Equity Intelligence Collaborative, LLC, Durham, NC (2014).\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h2>\n<p>The authors declare that they have no conflicts of interests.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Authors.27_contributions\">Authors' contributions<\/span><\/h2>\n<p>Timothy Jay Carney participated in the conceptual model development, design, and literature review of the manuscript. Christopher Michael Shea contributed to the review and substantial redesign of the manuscript for resubmission.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Acknowledgments\">Acknowledgments<\/span><\/h2>\n<p>The manuscript is supported by The University of North Carolina Gillings School of Public Health and The Lineberger Comprehensive Cancer Center and The Carolina Community Network Center to Reduce Cancer Health Disparities Diversity Supplement 3U54CA153602. Special thanks are due to Hannah M.L., Elance\/Upwork consultant, for outstanding editing, content review, and consultation on manuscript development.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-KukafkaPublic07-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KukafkaPublic07_1-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Kukafka, A.; Yasnoff, W.A. (2007). \"Public health informatics\". <i>Journal of Biomedical Informatics<\/i> <b>40<\/b> (4): 365\u2013369. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.jbi.2007.07.005\" target=\"_blank\">10.1016\/j.jbi.2007.07.005<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/17656158\" target=\"_blank\">17656158<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Public+health+informatics&rft.jtitle=Journal+of+Biomedical+Informatics&rft.aulast=Kukafka%2C+A.%3B+Yasnoff%2C+W.A.&rft.au=Kukafka%2C+A.%3B+Yasnoff%2C+W.A.&rft.date=2007&rft.volume=40&rft.issue=4&rft.pages=365%E2%80%93369&rft_id=info:doi\/10.1016%2Fj.jbi.2007.07.005&rft_id=info:pmid\/17656158&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-YasnoffPublic00-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-YasnoffPublic00_2-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Yasnoff, W.A.; O'Carroll, P.W.; Koo, D. et al. (2000). \"Public health informatics: Improving and transforming public health in the information age\". <i>Journal of Public Health Management and Practice<\/i> <b>6<\/b> (6): 67\u201375. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/18019962\" target=\"_blank\">18019962<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Public+health+informatics%3A+Improving+and+transforming+public+health+in+the+information+age&rft.jtitle=Journal+of+Public+Health+Management+and+Practice&rft.aulast=Yasnoff%2C+W.A.%3B+O%27Carroll%2C+P.W.%3B+Koo%2C+D.+et+al.&rft.au=Yasnoff%2C+W.A.%3B+O%27Carroll%2C+P.W.%3B+Koo%2C+D.+et+al.&rft.date=2000&rft.volume=6&rft.issue=6&rft.pages=67%E2%80%9375&rft_id=info:pmid\/18019962&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HsuTowards10-3\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-HsuTowards10_3-0\" rel=\"external_link\">3.0<\/a><\/sup> <sup><a href=\"#cite_ref-HsuTowards10_3-1\" rel=\"external_link\">3.1<\/a><\/sup> <sup><a href=\"#cite_ref-HsuTowards10_3-2\" rel=\"external_link\">3.2<\/a><\/sup> <sup><a href=\"#cite_ref-HsuTowards10_3-3\" rel=\"external_link\">3.3<\/a><\/sup> <sup><a href=\"#cite_ref-HsuTowards10_3-4\" rel=\"external_link\">3.4<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Hsu, C.E.; Chambers, W.C.; Herbold, J.R. et al. (2010). \"Towards shared situational awareness and actionable knowledge \u2014 an enhanced, human-centered paradigm for public health information system design\". <i>Journal of Homeland Security and Emergency Management<\/i> <b>7<\/b> (1). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.2202%2F1547-7355.1727\" target=\"_blank\">10.2202\/1547-7355.1727<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Towards+shared+situational+awareness+and+actionable+knowledge+%E2%80%94+an+enhanced%2C+human-centered+paradigm+for+public+health+information+system+design&rft.jtitle=Journal+of+Homeland+Security+and+Emergency+Management&rft.aulast=Hsu%2C+C.E.%3B+Chambers%2C+W.C.%3B+Herbold%2C+J.R.+et+al.&rft.au=Hsu%2C+C.E.%3B+Chambers%2C+W.C.%3B+Herbold%2C+J.R.+et+al.&rft.date=2010&rft.volume=7&rft.issue=1&rft_id=info:doi\/10.2202%2F1547-7355.1727&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MarchOrg93-4\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MarchOrg93_4-0\" rel=\"external_link\">4.0<\/a><\/sup> <sup><a href=\"#cite_ref-MarchOrg93_4-1\" rel=\"external_link\">4.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">March, J.G.; Simon, H.A. (1993). <i>Organizations<\/i> (2nd ed.). Wiley-Blackwell. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780631186311.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Organizations&rft.aulast=March%2C+J.G.%3B+Simon%2C+H.A.&rft.au=March%2C+J.G.%3B+Simon%2C+H.A.&rft.date=1993&rft.edition=2nd&rft.pub=Wiley-Blackwell&rft.isbn=9780631186311&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DavenportInfo97-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DavenportInfo97_5-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Davenport, T.H.; Prusak, L. (1997). <i>Information Ecology: Mastering the Information and Knowledge Environment<\/i> (1st ed.). Oxford University Press. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780195111682.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Information+Ecology%3A+Mastering+the+Information+and+Knowledge+Environment&rft.aulast=Davenport%2C+T.H.%3B+Prusak%2C+L.&rft.au=Davenport%2C+T.H.%3B+Prusak%2C+L.&rft.date=1997&rft.edition=1st&rft.pub=Oxford+University+Press&rft.isbn=9780195111682&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DiezRouxComp11-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DiezRouxComp11_6-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Diez Roux, A.V. (2011). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3154209\" target=\"_blank\">\"Complex systems thinking and current impasses in health disparities research\"<\/a>. <i>American Journal of Public Health<\/i> <b>101<\/b> (9): 1627\u201334. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.2105%2FAJPH.2011.300149\" target=\"_blank\">10.2105\/AJPH.2011.300149<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3154209\/\" target=\"_blank\">PMC3154209<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21778505\" target=\"_blank\">21778505<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3154209\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3154209<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Complex+systems+thinking+and+current+impasses+in+health+disparities+research&rft.jtitle=American+Journal+of+Public+Health&rft.aulast=Diez+Roux%2C+A.V.&rft.au=Diez+Roux%2C+A.V.&rft.date=2011&rft.volume=101&rft.issue=9&rft.pages=1627%E2%80%9334&rft_id=info:doi\/10.2105%2FAJPH.2011.300149&rft_id=info:pmc\/PMC3154209&rft_id=info:pmid\/21778505&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3154209&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LichACall13-7\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-LichACall13_7-0\" rel=\"external_link\">7.0<\/a><\/sup> <sup><a href=\"#cite_ref-LichACall13_7-1\" rel=\"external_link\">7.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Lich, K.H.; Ginexi, E.M.; Osgood, N.D.; Mabry, P.L. (2013). \"A call to address complexity in prevention science research\". <i>Prevention Science<\/i> <b>14<\/b> (3): 279-89. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1007%2Fs11121-012-0285-2\" target=\"_blank\">10.1007\/s11121-012-0285-2<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22983746\" target=\"_blank\">22983746<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+call+to+address+complexity+in+prevention+science+research&rft.jtitle=Prevention+Science&rft.aulast=Lich%2C+K.H.%3B+Ginexi%2C+E.M.%3B+Osgood%2C+N.D.%3B+Mabry%2C+P.L.&rft.au=Lich%2C+K.H.%3B+Ginexi%2C+E.M.%3B+Osgood%2C+N.D.%3B+Mabry%2C+P.L.&rft.date=2013&rft.volume=14&rft.issue=3&rft.pages=279-89&rft_id=info:doi\/10.1007%2Fs11121-012-0285-2&rft_id=info:pmid\/22983746&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ArndtComm00-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ArndtComm00_8-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Arndt, M.; Bigelow, B. (2000). \"Commentary: the potential of chaos theory and complexity theory for health services management\". <i>Health Care Management Review<\/i> <b>25<\/b> (1): 35\u20138. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/10710726\" target=\"_blank\">10710726<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Commentary%3A+the+potential+of+chaos+theory+and+complexity+theory+for+health+services+management&rft.jtitle=Health+Care+Management+Review&rft.aulast=Arndt%2C+M.%3B+Bigelow%2C+B.&rft.au=Arndt%2C+M.%3B+Bigelow%2C+B.&rft.date=2000&rft.volume=25&rft.issue=1&rft.pages=35%E2%80%938&rft_id=info:pmid\/10710726&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KlingOrg93-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KlingOrg93_9-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Kling, R. (1993). \"Organizational analysis in computer science\". <i>The Information Society<\/i> <b>9<\/b> (2): 71\u201387. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1080%2F01972243.1993.9960134\" target=\"_blank\">10.1080\/01972243.1993.9960134<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Organizational+analysis+in+computer+science&rft.jtitle=The+Information+Society&rft.aulast=Kling%2C+R.&rft.au=Kling%2C+R.&rft.date=1993&rft.volume=9&rft.issue=2&rft.pages=71%E2%80%9387&rft_id=info:doi\/10.1080%2F01972243.1993.9960134&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BanduraSocial76-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BanduraSocial76_10-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Bandura, A. (1976). <i>Social Learning Theory<\/i> (1st ed.). Prentice-Hall. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780138167448.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Social+Learning+Theory&rft.aulast=Bandura%2C+A.&rft.au=Bandura%2C+A.&rft.date=1976&rft.edition=1st&rft.pub=Prentice-Hall&rft.isbn=9780138167448&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KlingUnder05-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KlingUnder05_11-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Kling, R.; Rosenbaum, H.; Sawyer, S. (2005). <i>Understanding and Communicating Social Informatics: A Framework for Studying and Teaching the Human Contexts of Information and Communication Technologies<\/i>. Information Today, Inc. pp. 241. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9781573872287.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Understanding+and+Communicating+Social+Informatics%3A+A+Framework+for+Studying+and+Teaching+the+Human+Contexts+of+Information+and+Communication+Technologies&rft.aulast=Kling%2C+R.%3B+Rosenbaum%2C+H.%3B+Sawyer%2C+S.&rft.au=Kling%2C+R.%3B+Rosenbaum%2C+H.%3B+Sawyer%2C+S.&rft.date=2005&rft.pages=pp.%26nbsp%3B241&rft.pub=Information+Today%2C+Inc&rft.isbn=9781573872287&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LorenziOrg95-12\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-LorenziOrg95_12-0\" rel=\"external_link\">12.0<\/a><\/sup> <sup><a href=\"#cite_ref-LorenziOrg95_12-1\" rel=\"external_link\">12.1<\/a><\/sup> <sup><a href=\"#cite_ref-LorenziOrg95_12-2\" rel=\"external_link\">12.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Lorenzi, N.M.; Riley, R.T. (1995). <i>Organizational Aspects of Health Informatics<\/i>. Springer New York. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9781475741841.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Organizational+Aspects+of+Health+Informatics&rft.aulast=Lorenzi%2C+N.M.%3B+Riley%2C+R.T.&rft.au=Lorenzi%2C+N.M.%3B+Riley%2C+R.T.&rft.date=1995&rft.pub=Springer+New+York&rft.isbn=9781475741841&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Thi.C3.A9tartChaos95-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Thi.C3.A9tartChaos95_13-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Thi\u00e9tart, R.A.; Forgues, B. (1995). \"Chaos Theory and Organization\". <i>Organization Science<\/i> <b>6<\/b> (1): 19\u201331. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1287%2Forsc.6.1.19\" target=\"_blank\">10.1287\/orsc.6.1.19<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Chaos+Theory+and+Organization&rft.jtitle=Organization+Science&rft.aulast=Thi%C3%A9tart%2C+R.A.%3B+Forgues%2C+B.&rft.au=Thi%C3%A9tart%2C+R.A.%3B+Forgues%2C+B.&rft.date=1995&rft.volume=6&rft.issue=1&rft.pages=19%E2%80%9331&rft_id=info:doi\/10.1287%2Forsc.6.1.19&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GAOPublic10-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GAOPublic10_14-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">U.S. Government Accountability Office (17 December 2010). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.gao.gov\/products\/GAO-11-99\" target=\"_blank\">\"Public Health Information Technology: Additional Strategic Planning Needed to Guide HHS's Efforts to Establish Electronic Situational Awareness Capabilities\"<\/a>. pp. 49<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.gao.gov\/products\/GAO-11-99\" target=\"_blank\">http:\/\/www.gao.gov\/products\/GAO-11-99<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Public+Health+Information+Technology%3A+Additional+Strategic+Planning+Needed+to+Guide+HHS%27s+Efforts+to+Establish+Electronic+Situational+Awareness+Capabilities&rft.atitle=&rft.aulast=U.S.+Government+Accountability+Office&rft.au=U.S.+Government+Accountability+Office&rft.date=17+December+2010&rft.pages=pp.+49&rft_id=http%3A%2F%2Fwww.gao.gov%2Fproducts%2FGAO-11-99&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GrossmannEngin11-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GrossmannEngin11_15-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Grossmann, C.; Goolsby, W.A.; Olsen, L.A. (2011). <i>Engineering a Learning Healthcare System: A Look at the Future: Workshop Summary<\/i>. National Academies Press. pp. 340. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780309120654.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Engineering+a+Learning+Healthcare+System%3A+A+Look+at+the+Future%3A+Workshop+Summary&rft.aulast=Grossmann%2C+C.%3B+Goolsby%2C+W.A.%3B+Olsen%2C+L.A.&rft.au=Grossmann%2C+C.%3B+Goolsby%2C+W.A.%3B+Olsen%2C+L.A.&rft.date=2011&rft.pages=pp.%26nbsp%3B340&rft.pub=National+Academies+Press&rft.isbn=9780309120654&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BurtonTheDyn11-16\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BurtonTheDyn11_16-0\" rel=\"external_link\">16.0<\/a><\/sup> <sup><a href=\"#cite_ref-BurtonTheDyn11_16-1\" rel=\"external_link\">16.1<\/a><\/sup> <sup><a href=\"#cite_ref-BurtonTheDyn11_16-2\" rel=\"external_link\">16.2<\/a><\/sup> <sup><a href=\"#cite_ref-BurtonTheDyn11_16-3\" rel=\"external_link\">16.3<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Burton, R.M.; Obel, B. (2004). \"The Dynamics of the Change Process\". <i>Strategic Organizational Diagnosis and Design<\/i>. <b>4<\/b>. Springer U.S.. pp. 385-420. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9781441991140.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=The+Dynamics+of+the+Change+Process&rft.atitle=Strategic+Organizational+Diagnosis+and+Design&rft.aulast=Burton%2C+R.M.%3B+Obel%2C+B.&rft.au=Burton%2C+R.M.%3B+Obel%2C+B.&rft.date=2004&rft.volume=4&rft.pages=pp.%26nbsp%3B385-420&rft.pub=Springer+U.S.&rft.isbn=9781441991140&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NonakaADyn94-17\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-NonakaADyn94_17-0\" rel=\"external_link\">17.0<\/a><\/sup> <sup><a href=\"#cite_ref-NonakaADyn94_17-1\" rel=\"external_link\">17.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Nonaka, I. (1994). \"A dynamic theory of organizational knowledge creation\". <i>Organizational Science<\/i> <b>5<\/b> (1): 14\u201337. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1287%2Forsc.5.1.14\" target=\"_blank\">10.1287\/orsc.5.1.14<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+dynamic+theory+of+organizational+knowledge+creation&rft.jtitle=Organizational+Science&rft.aulast=Nonaka%2C+I.&rft.au=Nonaka%2C+I.&rft.date=1994&rft.volume=5&rft.issue=1&rft.pages=14%E2%80%9337&rft_id=info:doi\/10.1287%2Forsc.5.1.14&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PopperOrg98-18\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-PopperOrg98_18-0\" rel=\"external_link\">18.0<\/a><\/sup> <sup><a href=\"#cite_ref-PopperOrg98_18-1\" rel=\"external_link\">18.1<\/a><\/sup> <sup><a href=\"#cite_ref-PopperOrg98_18-2\" rel=\"external_link\">18.2<\/a><\/sup> <sup><a href=\"#cite_ref-PopperOrg98_18-3\" rel=\"external_link\">18.3<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Popper, M.; Lipshitz, R. (1998). \"Organizational learning mechanisms: A structural and cultural approach to organizational learning\". <i>Journal of Applied Behavioral Science<\/i> <b>34<\/b> (2): 161\u2013179. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1177%2F0021886398342003\" target=\"_blank\">10.1177\/0021886398342003<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Organizational+learning+mechanisms%3A+A+structural+and+cultural+approach+to+organizational+learning&rft.jtitle=Journal+of+Applied+Behavioral+Science&rft.aulast=Popper%2C+M.%3B+Lipshitz%2C+R.&rft.au=Popper%2C+M.%3B+Lipshitz%2C+R.&rft.date=1998&rft.volume=34&rft.issue=2&rft.pages=161%E2%80%93179&rft_id=info:doi\/10.1177%2F0021886398342003&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CrossanAnOrg99-19\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CrossanAnOrg99_19-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Crossan, M.M.; Lane, H.W.; White, R.E. (1999). \"An organizational learning framework: From intuition to institution\". <i>The Academy of Management Review<\/i> <b>24<\/b> (3): 522-537. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.5465%2FAMR.1999.2202135\" target=\"_blank\">10.5465\/AMR.1999.2202135<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+organizational+learning+framework%3A+From+intuition+to+institution&rft.jtitle=The+Academy+of+Management+Review&rft.aulast=Crossan%2C+M.M.%3B+Lane%2C+H.W.%3B+White%2C+R.E.&rft.au=Crossan%2C+M.M.%3B+Lane%2C+H.W.%3B+White%2C+R.E.&rft.date=1999&rft.volume=24&rft.issue=3&rft.pages=522-537&rft_id=info:doi\/10.5465%2FAMR.1999.2202135&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WarrenContr11-20\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WarrenContr11_20-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Warren, L.; Fuller, T. (2011). \"Contrasting Approaches to Preparedness: A Reflection on Two Case Studies\". <i>Managing Adaptability, Intervention, and People in Enterprise Information Systems<\/i>. IGI Global. pp. 18\u201334. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.4018%2F978-1-60960-529-2.ch002\" target=\"_blank\">10.4018\/978-1-60960-529-2.ch002<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9781609605292.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Contrasting+Approaches+to+Preparedness%3A+A+Reflection+on+Two+Case+Studies&rft.atitle=Managing+Adaptability%2C+Intervention%2C+and+People+in+Enterprise+Information+Systems&rft.aulast=Warren%2C+L.%3B+Fuller%2C+T.&rft.au=Warren%2C+L.%3B+Fuller%2C+T.&rft.date=2011&rft.pages=pp.%26nbsp%3B18%E2%80%9334&rft.pub=IGI+Global&rft_id=info:doi\/10.4018%2F978-1-60960-529-2.ch002&rft.isbn=9781609605292&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KleinMaking06-21\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KleinMaking06_21-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Klein, G.; Moon, B.; Hoffman, R.R. (2006). \"Making sense of Sensemaking 1: Alternative perspectives\". <i>IEEE Intelligent Systems<\/i> <b>21<\/b> (4): 70\u201373. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1109%2FMIS.2006.75\" target=\"_blank\">10.1109\/MIS.2006.75<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Making+sense+of+Sensemaking+1%3A+Alternative+perspectives&rft.jtitle=IEEE+Intelligent+Systems&rft.aulast=Klein%2C+G.%3B+Moon%2C+B.%3B+Hoffman%2C+R.R.&rft.au=Klein%2C+G.%3B+Moon%2C+B.%3B+Hoffman%2C+R.R.&rft.date=2006&rft.volume=21&rft.issue=4&rft.pages=70%E2%80%9373&rft_id=info:doi\/10.1109%2FMIS.2006.75&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-EndsleyDesign11-22\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-EndsleyDesign11_22-0\" rel=\"external_link\">22.0<\/a><\/sup> <sup><a href=\"#cite_ref-EndsleyDesign11_22-1\" rel=\"external_link\">22.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Endsley, M.R.; Jones, D.G. (2011). <i>Designing for Situation Awareness: An Approach to User-Centered Design<\/i> (2nd ed.). CRC Press. pp. 396. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9781420063554.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Designing+for+Situation+Awareness%3A+An+Approach+to+User-Centered+Design&rft.aulast=Endsley%2C+M.R.%3B+Jones%2C+D.G.&rft.au=Endsley%2C+M.R.%3B+Jones%2C+D.G.&rft.date=2011&rft.pages=pp.%26nbsp%3B396&rft.edition=2nd&rft.pub=CRC+Press&rft.isbn=9781420063554&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BowenOnt12-23\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BowenOnt12_23-0\" rel=\"external_link\">23.0<\/a><\/sup> <sup><a href=\"#cite_ref-BowenOnt12_23-1\" rel=\"external_link\">23.1<\/a><\/sup> <sup><a href=\"#cite_ref-BowenOnt12_23-2\" rel=\"external_link\">23.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Bowen, S. (2012). <i>Ontology (Knowledge Representation in Information Science)<\/i>. Ocean Media. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9788132330912.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Ontology+%28Knowledge+Representation+in+Information+Science%29&rft.aulast=Bowen%2C+S.&rft.au=Bowen%2C+S.&rft.date=2012&rft.pub=Ocean+Media&rft.isbn=9788132330912&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DymondAGuide95-24\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-DymondAGuide95_24-0\" rel=\"external_link\">24.0<\/a><\/sup> <sup><a href=\"#cite_ref-DymondAGuide95_24-1\" rel=\"external_link\">24.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Dymond, K.M. (1995). <i>A Guide to the Cmm: Understanding the Capability Maturity Model for Software<\/i>. Process Inc. U.S.. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780964600805.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=A+Guide+to+the+Cmm%3A+Understanding+the+Capability+Maturity+Model+for+Software&rft.aulast=Dymond%2C+K.M.&rft.au=Dymond%2C+K.M.&rft.date=1995&rft.pub=Process+Inc.+U.S.&rft.isbn=9780964600805&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-FriedmanAchiev10-25\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-FriedmanAchiev10_25-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Friedman, C.P.; Wong, A.K.; Blumenthal, D. (2010). \"Achieving a nationwide learning health system\". <i>Science Translational Medicine<\/i> <b>2<\/b> (57): 57cm29. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1126%2Fscitranslmed.3001456\" target=\"_blank\">10.1126\/scitranslmed.3001456<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21068440\" target=\"_blank\">21068440<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Achieving+a+nationwide+learning+health+system&rft.jtitle=Science+Translational+Medicine&rft.aulast=Friedman%2C+C.P.%3B+Wong%2C+A.K.%3B+Blumenthal%2C+D.&rft.au=Friedman%2C+C.P.%3B+Wong%2C+A.K.%3B+Blumenthal%2C+D.&rft.date=2010&rft.volume=2&rft.issue=57&rft.pages=57cm29&rft_id=info:doi\/10.1126%2Fscitranslmed.3001456&rft_id=info:pmid\/21068440&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SmithInfo12-26\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-SmithInfo12_26-0\" rel=\"external_link\">26.0<\/a><\/sup> <sup><a href=\"#cite_ref-SmithInfo12_26-1\" rel=\"external_link\">26.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Smith, P.F.; Ross, D.A. (2012). \"Information, knowledge, and wisdom in public health surveillance\". <i>Journal of Public Health Management and Practice<\/i> <b>18<\/b> (3): 193\u201395. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1097%2FPHH.0b013e318250b064\" target=\"_blank\">10.1097\/PHH.0b013e318250b064<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22473109\" target=\"_blank\">22473109<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Information%2C+knowledge%2C+and+wisdom+in+public+health+surveillance&rft.jtitle=Journal+of+Public+Health+Management+and+Practice&rft.aulast=Smith%2C+P.F.%3B+Ross%2C+D.A.&rft.au=Smith%2C+P.F.%3B+Ross%2C+D.A.&rft.date=2012&rft.volume=18&rft.issue=3&rft.pages=193%E2%80%9395&rft_id=info:doi\/10.1097%2FPHH.0b013e318250b064&rft_id=info:pmid\/22473109&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CarleyAdapt98-27\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CarleyAdapt98_27-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Carley, K. (1998). \"Adaptive organizations and emergent forms\". <i>Proceedings of the 3rd International Conference on Multi Agent Systems<\/i> <b>1998<\/b>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1109%2FICMAS.1998.699020\" target=\"_blank\">10.1109\/ICMAS.1998.699020<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Adaptive+organizations+and+emergent+forms&rft.jtitle=Proceedings+of+the+3rd+International+Conference+on+Multi+Agent+Systems&rft.aulast=Carley%2C+K.&rft.au=Carley%2C+K.&rft.date=1998&rft.volume=1998&rft_id=info:doi\/10.1109%2FICMAS.1998.699020&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KrackhardtAPCANS98-28\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-KrackhardtAPCANS98_28-0\" rel=\"external_link\">28.0<\/a><\/sup> <sup><a href=\"#cite_ref-KrackhardtAPCANS98_28-1\" rel=\"external_link\">28.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Krackhardt, D.; Carley, K.M. (1998). \"A PCANS model of structure in organizations\". <i>Proceedings of the 1998 International Symposium on Command and Control Research and Technology<\/i> <b>1998<\/b>: 113\u2013119. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1109%2FICMAS.1998.699020\" target=\"_blank\">10.1109\/ICMAS.1998.699020<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+PCANS+model+of+structure+in+organizations&rft.jtitle=Proceedings+of+the+1998+International+Symposium+on+Command+and+Control+Research+and+Technology&rft.aulast=Krackhardt%2C+D.%3B+Carley%2C+K.M.&rft.au=Krackhardt%2C+D.%3B+Carley%2C+K.M.&rft.date=1998&rft.volume=1998&rft.pages=113%E2%80%93119&rft_id=info:doi\/10.1109%2FICMAS.1998.699020&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SchreiberConstruct04-29\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-SchreiberConstruct04_29-0\" rel=\"external_link\">29.0<\/a><\/sup> <sup><a href=\"#cite_ref-SchreiberConstruct04_29-1\" rel=\"external_link\">29.1<\/a><\/sup> <sup><a href=\"#cite_ref-SchreiberConstruct04_29-2\" rel=\"external_link\">29.2<\/a><\/sup> <sup><a href=\"#cite_ref-SchreiberConstruct04_29-3\" rel=\"external_link\">29.3<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Schreiber, C.; Singh, S.; Carley, K.M. (May 2004). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/handle.dtic.mil\/100.2\/ADA460028\" target=\"_blank\">\"Construct\u2014A Multi-Agent Network Model for the Co-Evolution of Agents and Socio-Cultural Environments\"<\/a>. Carnegie Mellon University<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/handle.dtic.mil\/100.2\/ADA460028\" target=\"_blank\">http:\/\/handle.dtic.mil\/100.2\/ADA460028<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Construct%E2%80%94A+Multi-Agent+Network+Model+for+the+Co-Evolution+of+Agents+and+Socio-Cultural+Environments&rft.atitle=&rft.aulast=Schreiber%2C+C.%3B+Singh%2C+S.%3B+Carley%2C+K.M.&rft.au=Schreiber%2C+C.%3B+Singh%2C+S.%3B+Carley%2C+K.M.&rft.date=May+2004&rft.pub=Carnegie+Mellon+University&rft_id=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA460028&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HirshmanSpec07-30\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-HirshmanSpec07_30-0\" rel=\"external_link\">30.0<\/a><\/sup> <sup><a href=\"#cite_ref-HirshmanSpec07_30-1\" rel=\"external_link\">30.1<\/a><\/sup> <sup><a href=\"#cite_ref-HirshmanSpec07_30-2\" rel=\"external_link\">30.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Hirshman, B.R.; Carley, K.M.; Kowalchuck, M.J. (25 July 2007). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/handle.dtic.mil\/100.2\/ADA500804\" target=\"_blank\">\"Specifying Agents in Construct\"<\/a>. Carnegie Mellon University<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/handle.dtic.mil\/100.2\/ADA500804\" target=\"_blank\">http:\/\/handle.dtic.mil\/100.2\/ADA500804<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Specifying+Agents+in+Construct&rft.atitle=&rft.aulast=Hirshman%2C+B.R.%3B+Carley%2C+K.M.%3B+Kowalchuck%2C+M.J.&rft.au=Hirshman%2C+B.R.%3B+Carley%2C+K.M.%3B+Kowalchuck%2C+M.J.&rft.date=25+July+2007&rft.pub=Carnegie+Mellon+University&rft_id=http%3A%2F%2Fhandle.dtic.mil%2F100.2%2FADA500804&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CarneyUsing14-31\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-CarneyUsing14_31-0\" rel=\"external_link\">31.0<\/a><\/sup> <sup><a href=\"#cite_ref-CarneyUsing14_31-1\" rel=\"external_link\">31.1<\/a><\/sup> <sup><a href=\"#cite_ref-CarneyUsing14_31-2\" rel=\"external_link\">31.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Carney, T.J.; Morgan, G.P.; Jones, J. et al. (2014). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4194243\" target=\"_blank\"><i>Using computational modeling to assess the impact of clinical decision support on cancer screening improvement strategies within the community health centers<\/i><\/a>. <b>51<\/b>. pp. 200\u20139. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.jbi.2014.05.012\" target=\"_blank\">10.1016\/j.jbi.2014.05.012<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4194243\/\" target=\"_blank\">PMC4194243<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24953241\" target=\"_blank\">24953241<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4194243\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4194243<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Using+computational+modeling+to+assess+the+impact+of+clinical+decision+support+on+cancer+screening+improvement+strategies+within+the+community+health+centers&rft.aulast=Carney%2C+T.J.%3B+Morgan%2C+G.P.%3B+Jones%2C+J.+et+al.&rft.au=Carney%2C+T.J.%3B+Morgan%2C+G.P.%3B+Jones%2C+J.+et+al.&rft.date=2014&rft.volume=51&rft.pages=pp.+200%E2%80%939&rft_id=info:doi\/10.1016%2Fj.jbi.2014.05.012&rft_id=info:pmc\/PMC4194243&rft_id=info:pmid\/24953241&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4194243&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PEDDef-32\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-PEDDef_32-0\" rel=\"external_link\">32.0<\/a><\/sup> <sup><a href=\"#cite_ref-PEDDef_32-1\" rel=\"external_link\">32.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.economicsonline.co.uk\/Competitive_markets\/Price_elasticity_of_demand.html\" target=\"_blank\">\"Price elasticity of demand\"<\/a>. <i>Economics Online<\/i>. Economics Online Ltd<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.economicsonline.co.uk\/Competitive_markets\/Price_elasticity_of_demand.html\" target=\"_blank\">http:\/\/www.economicsonline.co.uk\/Competitive_markets\/Price_elasticity_of_demand.html<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Price+elasticity+of+demand&rft.atitle=Economics+Online&rft.pub=Economics+Online+Ltd&rft_id=http%3A%2F%2Fwww.economicsonline.co.uk%2FCompetitive_markets%2FPrice_elasticity_of_demand.html&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GolanInfo08-33\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-GolanInfo08_33-0\" rel=\"external_link\">33.0<\/a><\/sup> <sup><a href=\"#cite_ref-GolanInfo08_33-1\" rel=\"external_link\">33.1<\/a><\/sup> <sup><a href=\"#cite_ref-GolanInfo08_33-2\" rel=\"external_link\">33.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Golan, A.; Maasoumi, E. (2008). \"Information theoretic and entropy methods: An overview\". <i>Econometric Reviews<\/i> <b>27<\/b> (4\u20136): 317\u2013328. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1080%2F07474930801959685\" target=\"_blank\">10.1080\/07474930801959685<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Information+theoretic+and+entropy+methods%3A+An+overview&rft.jtitle=Econometric+Reviews&rft.aulast=Golan%2C+A.%3B+Maasoumi%2C+E.&rft.au=Golan%2C+A.%3B+Maasoumi%2C+E.&rft.date=2008&rft.volume=27&rft.issue=4%E2%80%936&rft.pages=317%E2%80%93328&rft_id=info:doi\/10.1080%2F07474930801959685&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ScarboroughNeural07-34\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-ScarboroughNeural07_34-0\" rel=\"external_link\">34.0<\/a><\/sup> <sup><a href=\"#cite_ref-ScarboroughNeural07_34-1\" rel=\"external_link\">34.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Scarborough, D. (2007). <i>Neural Networks in Organizational Research: Applying Pattern Recognition to the Analysis of Organizational Behavior<\/i>. American Psychological Association. pp. 187. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9781591474159.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Neural+Networks+in+Organizational+Research%3A+Applying+Pattern+Recognition+to+the+Analysis+of+Organizational+Behavior&rft.aulast=Scarborough%2C+D.&rft.au=Scarborough%2C+D.&rft.date=2007&rft.pages=pp.%26nbsp%3B187&rft.pub=American+Psychological+Association&rft.isbn=9781591474159&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MNNeuro-35\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MNNeuro_35-0\" rel=\"external_link\">35.0<\/a><\/sup> <sup><a href=\"#cite_ref-MNNeuro_35-1\" rel=\"external_link\">35.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.medicinenet.com\/script\/main\/art.asp?articlekey=40362\" target=\"_blank\">\"Definition of Neuroplasticity\"<\/a>. <i>MedicineNet<\/i>. MedicineNet, Inc. 14 June 2012<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.medicinenet.com\/script\/main\/art.asp?articlekey=40362\" target=\"_blank\">http:\/\/www.medicinenet.com\/script\/main\/art.asp?articlekey=40362<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Definition+of+Neuroplasticity&rft.atitle=MedicineNet&rft.date=14+June+2012&rft.pub=MedicineNet%2C+Inc&rft_id=http%3A%2F%2Fwww.medicinenet.com%2Fscript%2Fmain%2Fart.asp%3Farticlekey%3D40362&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DARPA-36\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DARPA_36-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">DARPA\/TTO (22 March 1989). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/catalog.hathitrust.org\/Record\/009752973\" target=\"_blank\"><i>DARPA Neural Network Study: Final Report<\/i><\/a>. Lincoln Laboratory, Massachusetts Institute of Technology<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/catalog.hathitrust.org\/Record\/009752973\" target=\"_blank\">https:\/\/catalog.hathitrust.org\/Record\/009752973<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=DARPA+Neural+Network+Study%3A+Final+Report&rft.aulast=DARPA%2FTTO&rft.au=DARPA%2FTTO&rft.date=22+March+1989&rft.pub=Lincoln+Laboratory%2C+Massachusetts+Institute+of+Technology&rft_id=https%3A%2F%2Fcatalog.hathitrust.org%2FRecord%2F009752973&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HammadComp98-37\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-HammadComp98_37-0\" rel=\"external_link\">37.0<\/a><\/sup> <sup><a href=\"#cite_ref-HammadComp98_37-1\" rel=\"external_link\">37.1<\/a><\/sup> <sup><a href=\"#cite_ref-HammadComp98_37-2\" rel=\"external_link\">37.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Hammad, T. (1998). \"Computational intelligence: Neural networks methodology for health decision support\". In Tan, J.K.H.; Sheps, S.. <i>Health Decision Support Systems<\/i>. Aspen Publishers, Inc. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 0834210657.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Computational+intelligence%3A+Neural+networks+methodology+for+health+decision+support&rft.atitle=Health+Decision+Support+Systems&rft.aulast=Hammad%2C+T.&rft.au=Hammad%2C+T.&rft.date=1998&rft.pub=Aspen+Publishers%2C+Inc&rft.isbn=0834210657&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CarleySmart02-38\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CarleySmart02_38-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Carley, K. (2002). \"Smart agents and organizations of the future\". In Lievrouw, L.A.; Livingstone, S.. <i>Handbook of New Media: Social Shaping and Consequences of ICTs<\/i>. SAGE Publications. pp. 206\u2013220. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780761965107.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Smart+agents+and+organizations+of+the+future&rft.atitle=Handbook+of+New+Media%3A+Social+Shaping+and+Consequences+of+ICTs&rft.aulast=Carley%2C+K.&rft.au=Carley%2C+K.&rft.date=2002&rft.pages=pp.%26nbsp%3B206%E2%80%93220&rft.pub=SAGE+Publications&rft.isbn=9780761965107&rfr_id=info:sid\/en.wikipedia.org:Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. A few grammar and spelling errors were also corrected.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20181214191104\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.863 seconds\nReal time usage: 0.895 seconds\nPreprocessor visited node count: 28798\/1000000\nPreprocessor generated node count: 39636\/1000000\nPost\u2010expand include size: 187266\/2097152 bytes\nTemplate argument size: 58126\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 849.272 1 - -total\n 85.37% 725.051 1 - Template:Reflist\n 74.57% 633.269 38 - Template:Citation\/core\n 40.70% 345.647 18 - Template:Cite_journal\n 29.35% 249.249 15 - Template:Cite_book\n 9.00% 76.434 5 - Template:Cite_web\n 7.38% 62.699 1 - Template:Infobox_journal_article\n 7.04% 59.761 1 - Template:Infobox\n 6.07% 51.532 41 - Template:Citation\/identifier\n 4.21% 35.764 80 - Template:Infobox\/row\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:9893-0!*!0!!en!5!* and timestamp 20181214191103 and revision id 29106\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective\">https:\/\/www.limswiki.org\/index.php\/Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","bfe42513d857c82a22a78dbd758fc186_images":["https:\/\/www.limswiki.org\/images\/d\/d5\/Fig1_Carney_CompMathMethMed2017.png","https:\/\/www.limswiki.org\/images\/d\/d7\/Fig2_Carney_CompMathMethMed2017.png","https:\/\/www.limswiki.org\/images\/6\/64\/Fig3_Carney_CompMathMethMed2017.png","https:\/\/www.limswiki.org\/images\/6\/6c\/Fig4_Carney_CompMathMethMed2017.png","https:\/\/www.limswiki.org\/images\/6\/66\/Fig5_Carney_CompMathMethMed2017.png"],"bfe42513d857c82a22a78dbd758fc186_timestamp":1544814663,"18474356308b22be86d3205a31b5a267_type":"article","18474356308b22be86d3205a31b5a267_title":"Use of application containers and workflows for genomic data analysis (Schulz et al. 2016)","18474356308b22be86d3205a31b5a267_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis","18474356308b22be86d3205a31b5a267_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Use of application containers and workflows for genomic data analysis\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nUse of application containers and workflows for genomic data analysisJournal\n \nJournal of Pathology InformaticsAuthor(s)\n \nSchulz, Wade L.; Durant, Thomas; Siddon, Alexa J.; Torres, RichardAuthor affiliation(s)\n \nYale University School of Medicine, VA Connecticut Healthcare SystemPrimary contact\n \nEmail: Log in to source site to viewYear published\n \n2016Volume and issue\n \n7Page(s)\n \n53DOI\n \n10.4103\/2153-3539.197197ISSN\n \n2153-3539Distribution license\n \nCreative Commons Attribution-NonCommercial-ShareAlike 3.0 UnportedWebsite\n \nhttp:\/\/www.jpathinformatics.orgDownload\n \nhttp:\/\/www.jpathinformatics.org\/temp\/JPatholInform7153-6111104_165831.pdf (PDF)\n\nContents\n\n1 Abstract \n2 Introduction \n3 Technical background \n4 Approach \n5 Conclusion \n6 Financial support and sponsorship \n7 Conflicts of interest \n8 References \n9 Notes \n\n\n\nAbstract \nBackground: The rapid acquisition of biological data and development of computationally intensive analyses has led to a need for novel approaches to software deployment. In particular, the complexity of common analytic tools for genomics makes them difficult to deploy and decreases the reproducibility of computational experiments. \nMethods: Recent technologies that allow for application virtualization, such as Docker, allow developers and bioinformaticians to isolate these applications and deploy secure, scalable platforms that have the potential to dramatically increase the efficiency of big data processing. \nResults: While limitations exist, this study demonstrates a successful implementation of a pipeline with several discrete software applications for the analysis of next-generation sequencing (NGS) data. \nConclusions: With this approach, we significantly reduced the amount of time needed to perform clonal analysis from NGS data in acute myeloid leukemia.\nKeywords: Big data, bioinformatics workflow, containerization, genomics\n\nIntroduction \nThe amount of data available for research is growing at an exponential rate. The recent push for open data has also rapidly increased the availability of biomedical datasets for secondary analysis. Examples include the Yale Open Data Access project[1], a repository of clinical trial data, and The Cancer Genome Atlas (TCGA)[2], a project that makes genomic data accessible to researchers after initial findings are released. While these data sets promote ongoing research, the ability to efficiently store, move, and analyze such large repositories is often a bottleneck to analysis.[3]\nIn addition to the massive growth in volume and availability, novel analyses \u2014 including advanced statistical methods and machine learning \u2014 often require significant resources for efficient processing. One example of this in biomedical research is the analysis of next generation sequencing (NGS) data. NGS is also known as massively parallel or high-throughput sequencing, as it simultaneously sequences many fragments of DNA, thereby producing enormous amounts of information. These datasets often require several preprocessing steps followed by detailed analysis. In addition to being resource intensive, the reproducibility of computational experiments using these data is often limited due to the complexity of system and software configuration.[4] Some application frameworks have made advances to improve the reproducibility of individual applications and analysis pipelines[5][6], but significant work remains to increase this reliability, particularly for experiments performed in resource-limited environments or on computational clusters.\nThe deployment of complex computational systems is not unique to bioinformatics. As such, there has been significant progress in building virtualization layers for operating systems and more recently, software applications.[7][8] A current example of this includes the Docker platform (Docker, San Francisco, CA, U.S.A.), which allows for the creation and configuration of software containers for deployment on a range of systems.[9][10] While the use of these technologies has limitations, it also has the potential improve the usability of many software applications in computational biology. As such, several studies and initiatives have begun to focus on the use of Docker in bioinformatics and computer science research.[11][12][13] In this paper, we demonstrate the potential benefits of containerized applications and application workflows for computational genomics research.\n\nTechnical background \nTo augment an ongoing study related to tumor heterogeneity, we obtained access to acute myeloid leukemia (AML) NGS data from TCGA.[14] Aligned NGS data from TCGA are available through the Cancer Genomics Hub (cgHub). The data set of interest consisted of approximately 12 terabytes (TBs) of whole genome sequencing (WGS) data and another 12 TB of whole exome sequencing data. Our analysis required the identification of somatic variants followed by a prediction of tumor heterogeneity using publicly available software tools. Unfortunately, many bioinformatics tools have specific software dependencies and natively run on only a subset of operating systems.[15] In addition, many applications are unable to run their computations in parallel, thus limiting analysis throughput. While increasing the number of servers or individual server resources can improve analysis speed, overall processing may still be less efficient due to these limitations.\nAs previously noted, the Docker platform allows for virtualized application deployments within a lightweight, Linux-based wrapper called a container.[9][11][15] This approach is similar to operating system virtualization but at the application level. Containerization enables developers to create virtual environments that have only the minimum necessary libraries, which users can quickly deploy on their own infrastructure in a secure, reproducible fashion. In addition, the isolation offered by this approach means that a more robust, parallelized workflow can be created for some applications that do not natively support multi-threading or parallel processing. While this type of workflow implementation is not beneficial for all use cases, scenarios where compute capacity on a single node exceeds what a single application can utilize are likely to benefit from such an approach.\nTo efficiently predict tumor heterogeneity from TCGA data, we implemented two key technologies: a Python workflow using the Luigi library (Spotify, Stockholm, Sweden)[16] and Docker containers for each key application in our analysis pipeline. While neither of these technologies is unique to genomic analysis or bioinformatics, Luigi is a general workflow orchestration library, and Docker allows for application visualization, as previously noted. For this use case, we were restricted to a single computational node with eight cores and 128 gigabyte (GB) of memory for analysis. As shown in [Figure 1], aligned WGS data were obtained for paired germline and tumor specimens from cgHub (now the genomic data commons) using the cgDownload utility.[17] We identified somatic variants with the SomaticSniper application[18] and generated clonal predictions with the SciClone library.[19] Sequential data acquisition and analysis took approximately four hours to complete per paired sequence but had significant variability between specimens, since file size and the number of genetic mutations varied significantly.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. Serial workflow and architecture to download Cancer Genomics Hub data. To obtain next generation sequencing data from Cancer Genomics Hub, the cgDownload utility was used to transfer aligned whole genome and whole exome sequencing data. The SomaticSniper utility was then used to identify somatic variants and tumor clonality was predicted with SciClone. These utilities were all manually configured on a server running CentOS 6.7.\n\n\n\nApproach \nBecause of the large number of specimens and the time needed for sequential analysis, our first approach to improving pipeline efficiency was the implementation of a Python workflow using the Luigi library. This library is commonly used for workflow automation and has gained significant traction within bioinformatics through the development of the Sci: Luigi library.[20] Implementation of our pipeline with this workflow library allowed us to include automated fault tolerance, primarily for issues that resulted in sequence download failures due to brief losses in network connectivity. In addition, the workflow could run continuously and unsupervised, which markedly reduced the amount of hands-on time needed for analysis.\nWorkflow automation led to a significant improvement in efficiency; however, the inability to parallelize the software packages limited analysis throughput. While the cgDownload utility does support multi-threading, local bandwidth and disk size limitations made bulk downloading of the entire TCGA data set difficult. In addition, a bulk download followed by a full analysis of the data set would mean that network capacity would be saturated while the computational resources remained idle, followed by the opposite scenario. To maintain high, simultaneous utilization of all local hardware resources, including bandwidth, memory, and processor capacity, we deployed the cgDownload utility, SomaticSniper, and SciClone within isolated Docker containers and executed each container with our Luigi workflow [Figure 2]. This approach allowed us to horizontally and vertically scale each application to take full advantage of our local hardware. In addition, each application could be deployed on a single node running CentOS7 (Red Hat, Raleigh, NC, USA) within a container running its natively-supported operating system: CentOS6 for cgDownload and Ubuntu (Canonical, London, UK) for SomaticSniper and SciClone.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 2. Comparison of standard application architecture and containerized architecture for clonal analysis. (a) When deployed in a virtual server, the analysis workflow was installed on CentOS 6.7 and had to be run serially due to limitations in software parallelization and local resources. Applications are launched manually in sequence to download NGS data, identify variants, and predict tumor clonality. (b) When configured in Docker containers and driven by a workflow manager, applications were automatically launched and able to scale based on available system resources. Each application was configured on its native operating system architecture within the container, as indicated in the figure.\n\n\n\nThis approach to application deployment can offer significant performance benefits. However, any virtualization technology has the potential to offset these gains due to resource overhead in the virtualization layer. To assess the impact of Docker virtualization on two key metrics, disk throughput, and processing efficiency, we used two benchmarking tools to evaluate performance on the virtual server as well as in a Docker container. To benchmark disk input\/output performance, we used the dd command line tool, a standard Linux utility that can be used to read and write files and to gather performance statistics. The dd utility was used to write a 1 GB file and showed similar performance in both a virtual machine and a Docker container within this same environment [Figure 3a]. Similarly, results from sysbench, an open-source benchmarking utility originally created by MySQL AB, found that the time needed to calculate 10,000 primes in either environment was equivalent [Figure 3b]. When combined with evidence from other studies[21], these results demonstrate that Docker has a minimal overhead for these components.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 3. Disk throughput and processor efficiency of Docker containers. (a) The time needed to write a one-gigabyte file with the dd utility was similar in both a virtual machine and within a Docker container on the same host. (b) The calculation of 10,000 primes with the sysbench utility showed similar performance in a virtual machine and a Docker container on the same host.\n\n\n\nSince we found similar benchmarking results for both the virtual machine and Docker containers, we next executed our workflow to analyze the AML WGS results from TCGA. Using this approach, we were able to stagger data download and processing to take advantage of all system resources [Figure 4]. One limitation to this approach is that none of the tools described here provide built-in resource monitoring. This pipeline was well-suited to parallel analysis since each application had specific, isolated resource needs such as network, storage, or compute capacity, which could be monitored with custom code within the Python workflow. Use of this automated staggered workflow with Docker containers allowed use to analyze fifty specimens from the TCGA data set within approximately 3 days. It is difficult to provide statistical performance metrics since many factors such as data volume, and network bandwidth can significantly alter the overall pipeline performance. However, the general approach of staggered, parallel computation should provide increased processing efficiency for workloads such as this.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 4. Illustration of parallelization improvements with a workflow-driven container architecture. (a) When performed serially, the download (white bars) and analysis (shaded bars) of a single pair of tumor and germline sequence on local hardware took approximately four hours (bars drawn to scale). (b) When parallelized with a workflow manager and Docker containers, multiple specimens could be processed simultaneously to take advantage of all system resources, including network, memory, and processor capacity.\n\n\n\nConclusion \nhe complex nature of genomic data and the tools used to analyze these data sets makes efficient processing difficult with standard environments. As noted above, the use of emerging technologies such as Docker in combination with automated workflows may significantly improve the efficiency of data processing in bioinformatics. With the growing number of open data projects, use of these techniques will be necessary to take advantage of available computational resources.\nWhile performance and pipeline efficiency were key components of this implementation, Docker containers also allow for application isolation from the host operating system. Since many bioinformatics tools have complex sets of dependencies and are difficult to build from source, the ability to deploy containers with different operating systems and dependency versions to the same host decreases the amount of effort needed to being analysis. For example, the cgDownload utility is distributed as a compiled binary for use on CentOS 6.7, but can only be deployed on CentOS 7 when built from source, which requires a significant amount of manual configuration. As shown in [Figure 2], the use of containers allowed the deployment of each utility on its natively supported operating system, which improves stability and decreases the potential for dependency conflicts among software applications.\nSeveral other tools exist for the orchestration of containerized applications, such as Kubernetes and Docker Swarm. For complex platforms, these tools can be used to deploy containers across hardware clusters and to integrate networking and storage resources between containers. However, these applications work strictly at the container level and do not inherently provide application-level workflows as presented here. Additional implementation experience about the use of these tools within high-performance clusters may provide valuable insights about the scalability of these tools within bioinformatics workflows.\nThe above findings demonstrate the promise of emerging technologies to improve the efficiency of genomic analysis. Because of the subsequent increase in analysis throughput, use of these tools means that big data analyses can be done even with limited local computational capacity. Finally, use of container technology can improve pipeline and experimental reproducibility since preconfigured applications can be readily deployed to nearly any host system. While many factors can impact reproducibility, the use of containers limits variability due to differences in software environment or application configuration when appropriately deployed. The continued use of emerging technology and novel approaches to software architecture has the potential to increase the efficiency of computational analysis in bioinformatics.\n\nFinancial support and sponsorship \nACLPS Paul E. Strandjord Young Investigator Grant.\n\nConflicts of interest \nThere are no conflicts of interest.\n\nReferences \n\n\n\u2191 Krumholz, H.M.; Waldstreicher, J. (2016). \"The Yale Open Data Access (YODA) Project--A mechanism for data sharing\". New England Journal of Medicine 375 (5): 403\u2013405. doi:10.1056\/NEJMp1607342. PMID 27518657.   \n\n\u2191 Collins, F.S.; Barker, A.D. (2007). \"Mapping the cancer genome: Pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies\". Scientific American 296 (3): 50\u20137. PMID 17348159.   \n\n\u2191 Fan, J.; Han, F.; Liu, H. (2014). \"Challenges of big data analysis\". National Science Review 1 (2): 293\u2013314. doi:10.1093\/nsr\/nwt032. PMC PMC4236847. PMID 25419469. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4236847 .   \n\n\u2191 Nekrutenko, A.; Taylor, J. (2012). \"Next-generation sequencing data interpretation: Enhancing reproducibility and accessibility\". Nature Reviews Genetics 13 (9): 667\u201372. doi:10.1038\/nrg3305. PMID 22898652.   \n\n\u2191 Blankenberg, Daniel; Von Kuster, Gregory; Coraor, Nathaniel; Ananda, Guruprasad; Lazarus, Ross; Mangan, Mary; Nekrutenko, Anton; Taylor, James (2010). \"Galaxy: a web-based genome analysis tool for experimentalists\". Current Protocols in Molecular Biology 19 (Unit 19.10.1\u201321). doi:10.1002\/0471142727.mb1910s89. PMC PMC4264107. PMID 20069535. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4264107 .   \n\n\u2191 Hatakeyama, M.; Opitz, L.; Russo, G.; Qi, W.; Schlapbach, R.; Rehrauer, H. (2016). \"SUSHI: An exquisite recipe for fully documented, reproducible and reusable NGS data analysis\". BMC Bioinformatics 17: 228. doi:10.1186\/s12859-016-1104-8.   \n\n\u2191 Dudley, J.T.; Butte, A.J. (2010). \"In silico research in the era of cloud computing\". Nature Biotechnology 28 (11): 1181\u20135. doi:10.1038\/nbt1110-1181. PMC PMC3755123. PMID 21057489. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3755123 .   \n\n\u2191 Howe, B. (2012). \"Virtual appliances, cloud computing, and reproducible research\". Computing in Science & Engineering 14 (4): 36\u201341. doi:10.1109\/MCSE.2012.62.   \n\n\u2191 9.0 9.1 \"Docker\". Docker, Inc. https:\/\/www.docker.com\/ . Retrieved 21 November 2016 .   \n\n\u2191 Anderson, C. (2015). \"Docker\". IEEE Software 32 (3): 102\u2013105. doi:10.1109\/MS.2015.62.   \n\n\u2191 11.0 11.1 Boettiger, C. (2015). \"An introduction to Docker for reproducible research\". SIGOPS Operating Systems Review 49 (1): 71\u201379. doi:10.1145\/2723872.2723882.   \n\n\u2191 Hung, L.H.; Kristiyanto, D.; Lee, S.B.; Yeung, K.Y. (2016). \"GUIdock: Using Docker containers with a common graphics user interface to address the reproducibility of research\". PLoS One 11 (4): e0152686. doi:10.1371\/journal.pone.0152686. PMC PMC4821530. PMID 27045593. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4821530 .   \n\n\u2191 Moreews, F.; Sallou, O.; M\u00e9nager, H. et al. (2015). \"BioShaDock: A community driven bioinformatics shared Docker-based tools registry\". F1000Research 4: 1443. doi:10.12688\/f1000research.7536.1. PMC PMC4743153. PMID 26913191. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4743153 .   \n\n\u2191 Cancer Genome Atlas Research Network (2013). \"Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia\". New England Journal of Medicine 368 (22): 2059-74. doi:10.1056\/NEJMoa1301689. PMC PMC3767041. PMID 23634996. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3767041 .   \n\n\u2191 15.0 15.1 Piccolo, S.R.; Frampton, M.B. (2016). \"Tools and techniques for computational reproducibility\". GigaScience 5 (1): 30. doi:10.1186\/s13742-016-0135-4. PMC PMC4940747. PMID 27401684. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4940747 .   \n\n\u2191 \"Spotify\/Luigi\". GitHub, Inc. https:\/\/github.com\/spotify\/luigi . Retrieved 21 November 2016 .   \n\n\u2191 \"Genomic Data Commons\". National Cancer Institute. https:\/\/gdc.cancer.gov\/ . Retrieved 21 November 2016 .   \n\n\u2191 Larson, D.E.; Harris, C.C.; Chen, K. et al. (2012). \"SomaticSniper: Identification of somatic point mutations in whole genome sequencing data\". Bioinformatics 28 (3): 311-7. doi:10.1093\/bioinformatics\/btr665. PMC PMC3268238. PMID 22155872. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3268238 .   \n\n\u2191 Miller, C.A.; White, B.S.; Dees, N.D. et al. (2014). \"SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution\". PLoS Computational Biology 10 (8): e1003665. doi:10.1371\/journal.pcbi.1003665. PMC PMC4125065. PMID 25102416. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4125065 .   \n\n\u2191 \"Pharmbio\/SciLuigi\". GitHub, Inc. https:\/\/github.com\/pharmbio\/sciluigi . Retrieved 21 November 2016 .   \n\n\u2191 Preeth, E.N.; Mulerickal, F.J.; Paul, B.; Sastri, Y. (2015). \"Evaluation of Docker containers based on hardware utilization\". Proceedings of the 2015 International Conference on Control Communication and Computing India 2015: 697\u2013700. doi:10.1109\/ICCC.2015.7432984.   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\">https:\/\/www.limswiki.org\/index.php\/Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2017)LIMSwiki journal articles (all)LIMSwiki journal articles on pathology informatics\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 17 January 2017, at 01:36.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 1,210 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","18474356308b22be86d3205a31b5a267_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Use_of_application_containers_and_workflows_for_genomic_data_analysis skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Use of application containers and workflows for genomic data analysis<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p><b>Background<\/b>: The rapid acquisition of biological data and development of computationally intensive analyses has led to a need for novel approaches to software deployment. In particular, the complexity of common analytic tools for <a href=\"https:\/\/www.limswiki.org\/index.php\/Genomics\" title=\"Genomics\" target=\"_blank\" class=\"wiki-link\" data-key=\"96a82dabf51cf9510dd00c5a03396c44\">genomics<\/a> makes them difficult to deploy and decreases the reproducibility of computational experiments. \n<\/p><p><b>Methods<\/b>: Recent technologies that allow for application virtualization, such as Docker, allow developers and bioinformaticians to isolate these applications and deploy secure, scalable platforms that have the potential to dramatically increase the efficiency of big data processing. \n<\/p><p><b>Results<\/b>: While limitations exist, this study demonstrates a successful implementation of a pipeline with several discrete software applications for the analysis of next-generation sequencing (NGS) data. \n<\/p><p><b>Conclusions<\/b>: With this approach, we significantly reduced the amount of time needed to perform clonal analysis from NGS data in acute myeloid leukemia.\n<\/p><p><b>Keywords<\/b>: Big data, bioinformatics workflow, containerization, genomics\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>The amount of data available for research is growing at an exponential rate. The recent push for open data has also rapidly increased the availability of biomedical datasets for secondary analysis. Examples include the Yale Open Data Access project<sup id=\"rdp-ebb-cite_ref-KrumholzTheYale16_1-0\" class=\"reference\"><a href=\"#cite_note-KrumholzTheYale16-1\" rel=\"external_link\">[1]<\/a><\/sup>, a repository of clinical trial data, and The Cancer Genome Atlas (TCGA)<sup id=\"rdp-ebb-cite_ref-CollinsMapping07_2-0\" class=\"reference\"><a href=\"#cite_note-CollinsMapping07-2\" rel=\"external_link\">[2]<\/a><\/sup>, a project that makes genomic data accessible to researchers after initial findings are released. While these data sets promote ongoing research, the ability to efficiently store, move, and analyze such large repositories is often a bottleneck to analysis.<sup id=\"rdp-ebb-cite_ref-FanChallenges14_3-0\" class=\"reference\"><a href=\"#cite_note-FanChallenges14-3\" rel=\"external_link\">[3]<\/a><\/sup>\n<\/p><p>In addition to the massive growth in volume and availability, novel analyses \u2014 including advanced statistical methods and machine learning \u2014 often require significant resources for efficient processing. One example of this in biomedical research is the analysis of next generation sequencing (NGS) data. NGS is also known as massively parallel or high-throughput sequencing, as it simultaneously sequences many fragments of DNA, thereby producing enormous amounts of information. These datasets often require several preprocessing steps followed by detailed analysis. In addition to being resource intensive, the reproducibility of computational experiments using these data is often limited due to the complexity of system and software configuration.<sup id=\"rdp-ebb-cite_ref-NekrutenkoNext12_4-0\" class=\"reference\"><a href=\"#cite_note-NekrutenkoNext12-4\" rel=\"external_link\">[4]<\/a><\/sup> Some application frameworks have made advances to improve the reproducibility of individual applications and analysis pipelines<sup id=\"rdp-ebb-cite_ref-BlankenbergGalaxy10_5-0\" class=\"reference\"><a href=\"#cite_note-BlankenbergGalaxy10-5\" rel=\"external_link\">[5]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-HatekeyamaSUSHI16_6-0\" class=\"reference\"><a href=\"#cite_note-HatekeyamaSUSHI16-6\" rel=\"external_link\">[6]<\/a><\/sup>, but significant work remains to increase this reliability, particularly for experiments performed in resource-limited environments or on computational clusters.\n<\/p><p>The deployment of complex computational systems is not unique to <a href=\"https:\/\/www.limswiki.org\/index.php\/Bioinformatics\" title=\"Bioinformatics\" target=\"_blank\" class=\"wiki-link\" data-key=\"8f506695fdbb26e3f314da308f8c053b\">bioinformatics<\/a>. As such, there has been significant progress in building virtualization layers for operating systems and more recently, software applications.<sup id=\"rdp-ebb-cite_ref-DudleyInSilico10_7-0\" class=\"reference\"><a href=\"#cite_note-DudleyInSilico10-7\" rel=\"external_link\">[7]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-HoweVirtual12_8-0\" class=\"reference\"><a href=\"#cite_note-HoweVirtual12-8\" rel=\"external_link\">[8]<\/a><\/sup> A current example of this includes the Docker platform (Docker, San Francisco, CA, U.S.A.), which allows for the creation and configuration of software containers for deployment on a range of systems.<sup id=\"rdp-ebb-cite_ref-DockerHome_9-0\" class=\"reference\"><a href=\"#cite_note-DockerHome-9\" rel=\"external_link\">[9]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-AndersonDocker15_10-0\" class=\"reference\"><a href=\"#cite_note-AndersonDocker15-10\" rel=\"external_link\">[10]<\/a><\/sup> While the use of these technologies has limitations, it also has the potential improve the usability of many software applications in computational biology. As such, several studies and initiatives have begun to focus on the use of Docker in bioinformatics and computer science research.<sup id=\"rdp-ebb-cite_ref-BoettigerAnIntro15_11-0\" class=\"reference\"><a href=\"#cite_note-BoettigerAnIntro15-11\" rel=\"external_link\">[11]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-HungGUIdock16_12-0\" class=\"reference\"><a href=\"#cite_note-HungGUIdock16-12\" rel=\"external_link\">[12]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-MoreewsBioShaDock15_13-0\" class=\"reference\"><a href=\"#cite_note-MoreewsBioShaDock15-13\" rel=\"external_link\">[13]<\/a><\/sup> In this paper, we demonstrate the potential benefits of containerized applications and application workflows for computational genomics research.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Technical_background\">Technical background<\/span><\/h2>\n<p>To augment an ongoing study related to tumor heterogeneity, we obtained access to acute myeloid leukemia (AML) NGS data from TCGA.<sup id=\"rdp-ebb-cite_ref-CGARNGenomic13_14-0\" class=\"reference\"><a href=\"#cite_note-CGARNGenomic13-14\" rel=\"external_link\">[14]<\/a><\/sup> Aligned NGS data from TCGA are available through the Cancer Genomics Hub (cgHub). The data set of interest consisted of approximately 12 terabytes (TBs) of whole genome sequencing (WGS) data and another 12 TB of whole exome sequencing data. Our analysis required the identification of somatic variants followed by a prediction of tumor heterogeneity using publicly available software tools. Unfortunately, many bioinformatics tools have specific software dependencies and natively run on only a subset of operating systems.<sup id=\"rdp-ebb-cite_ref-PiccoloTools16_15-0\" class=\"reference\"><a href=\"#cite_note-PiccoloTools16-15\" rel=\"external_link\">[15]<\/a><\/sup> In addition, many applications are unable to run their computations in parallel, thus limiting analysis throughput. While increasing the number of servers or individual server resources can improve analysis speed, overall processing may still be less efficient due to these limitations.\n<\/p><p>As previously noted, the Docker platform allows for virtualized application deployments within a lightweight, Linux-based wrapper called a container.<sup id=\"rdp-ebb-cite_ref-DockerHome_9-1\" class=\"reference\"><a href=\"#cite_note-DockerHome-9\" rel=\"external_link\">[9]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-BoettigerAnIntro15_11-1\" class=\"reference\"><a href=\"#cite_note-BoettigerAnIntro15-11\" rel=\"external_link\">[11]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-PiccoloTools16_15-1\" class=\"reference\"><a href=\"#cite_note-PiccoloTools16-15\" rel=\"external_link\">[15]<\/a><\/sup> This approach is similar to operating system virtualization but at the application level. Containerization enables developers to create virtual environments that have only the minimum necessary libraries, which users can quickly deploy on their own infrastructure in a secure, reproducible fashion. In addition, the isolation offered by this approach means that a more robust, parallelized workflow can be created for some applications that do not natively support multi-threading or parallel processing. While this type of workflow implementation is not beneficial for all use cases, scenarios where compute capacity on a single node exceeds what a single application can utilize are likely to benefit from such an approach.\n<\/p><p>To efficiently predict tumor heterogeneity from TCGA data, we implemented two key technologies: a Python workflow using the Luigi library (Spotify, Stockholm, Sweden)<sup id=\"rdp-ebb-cite_ref-SpotifyLuigi_16-0\" class=\"reference\"><a href=\"#cite_note-SpotifyLuigi-16\" rel=\"external_link\">[16]<\/a><\/sup> and Docker containers for each key application in our analysis pipeline. While neither of these technologies is unique to genomic analysis or bioinformatics, Luigi is a general workflow orchestration library, and Docker allows for application visualization, as previously noted. For this use case, we were restricted to a single computational node with eight cores and 128 gigabyte (GB) of memory for analysis. As shown in [Figure 1], aligned WGS data were obtained for paired germline and tumor specimens from cgHub (now the genomic data commons) using the cgDownload utility.<sup id=\"rdp-ebb-cite_ref-NIHGDC_17-0\" class=\"reference\"><a href=\"#cite_note-NIHGDC-17\" rel=\"external_link\">[17]<\/a><\/sup> We identified somatic variants with the SomaticSniper application<sup id=\"rdp-ebb-cite_ref-LarsonSomatic12_18-0\" class=\"reference\"><a href=\"#cite_note-LarsonSomatic12-18\" rel=\"external_link\">[18]<\/a><\/sup> and generated clonal predictions with the SciClone library.<sup id=\"rdp-ebb-cite_ref-MillerSciClone14_19-0\" class=\"reference\"><a href=\"#cite_note-MillerSciClone14-19\" rel=\"external_link\">[19]<\/a><\/sup> Sequential data acquisition and analysis took approximately four hours to complete per paired sequence but had significant variability between specimens, since file size and the number of genetic mutations varied significantly.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Schulz_JofPathInformatics2016_7.jpg\" class=\"image wiki-link\" target=\"_blank\" data-key=\"305cd08dd9a06f217cf52a5f65732eb2\"><img alt=\"Fig1 Schulz JofPathInformatics2016 7.jpg\" src=\"https:\/\/www.limswiki.org\/images\/f\/f1\/Fig1_Schulz_JofPathInformatics2016_7.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> Serial workflow and architecture to download Cancer Genomics Hub data. To obtain next generation sequencing data from Cancer Genomics Hub, the cgDownload utility was used to transfer aligned whole genome and whole exome sequencing data. The SomaticSniper utility was then used to identify somatic variants and tumor clonality was predicted with SciClone. These utilities were all manually configured on a server running CentOS 6.7.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Approach\">Approach<\/span><\/h2>\n<p>Because of the large number of specimens and the time needed for sequential analysis, our first approach to improving pipeline efficiency was the implementation of a Python workflow using the Luigi library. This library is commonly used for workflow automation and has gained significant traction within bioinformatics through the development of the Sci: Luigi library.<sup id=\"rdp-ebb-cite_ref-SciLuigi_20-0\" class=\"reference\"><a href=\"#cite_note-SciLuigi-20\" rel=\"external_link\">[20]<\/a><\/sup> Implementation of our pipeline with this workflow library allowed us to include automated fault tolerance, primarily for issues that resulted in sequence download failures due to brief losses in network connectivity. In addition, the workflow could run continuously and unsupervised, which markedly reduced the amount of hands-on time needed for analysis.\n<\/p><p>Workflow automation led to a significant improvement in efficiency; however, the inability to parallelize the software packages limited analysis throughput. While the cgDownload utility does support multi-threading, local bandwidth and disk size limitations made bulk downloading of the entire TCGA data set difficult. In addition, a bulk download followed by a full analysis of the data set would mean that network capacity would be saturated while the computational resources remained idle, followed by the opposite scenario. To maintain high, simultaneous utilization of all local hardware resources, including bandwidth, memory, and processor capacity, we deployed the cgDownload utility, SomaticSniper, and SciClone within isolated Docker containers and executed each container with our Luigi workflow [Figure 2]. This approach allowed us to horizontally and vertically scale each application to take full advantage of our local hardware. In addition, each application could be deployed on a single node running CentOS7 (Red Hat, Raleigh, NC, USA) within a container running its natively-supported operating system: CentOS6 for cgDownload and Ubuntu (Canonical, London, UK) for SomaticSniper and SciClone.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Schulz_JofPathInformatics2016_7.jpg\" class=\"image wiki-link\" target=\"_blank\" data-key=\"aa488298693b72f872fb613b3120fbc4\"><img alt=\"Fig2 Schulz JofPathInformatics2016 7.jpg\" src=\"https:\/\/www.limswiki.org\/images\/1\/17\/Fig2_Schulz_JofPathInformatics2016_7.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> Comparison of standard application architecture and containerized architecture for clonal analysis. (a) When deployed in a virtual server, the analysis workflow was installed on CentOS 6.7 and had to be run serially due to limitations in software parallelization and local resources. Applications are launched manually in sequence to download NGS data, identify variants, and predict tumor clonality. (b) When configured in Docker containers and driven by a workflow manager, applications were automatically launched and able to scale based on available system resources. Each application was configured on its native operating system architecture within the container, as indicated in the figure.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>This approach to application deployment can offer significant performance benefits. However, any virtualization technology has the potential to offset these gains due to resource overhead in the virtualization layer. To assess the impact of Docker virtualization on two key metrics, disk throughput, and processing efficiency, we used two benchmarking tools to evaluate performance on the virtual server as well as in a Docker container. To benchmark disk input\/output performance, we used the dd command line tool, a standard Linux utility that can be used to read and write files and to gather performance statistics. The dd utility was used to write a 1 GB file and showed similar performance in both a virtual machine and a Docker container within this same environment [Figure 3a]. Similarly, results from sysbench, an open-source benchmarking utility originally created by MySQL AB, found that the time needed to calculate 10,000 primes in either environment was equivalent [Figure 3b]. When combined with evidence from other studies<sup id=\"rdp-ebb-cite_ref-PreethEval15_21-0\" class=\"reference\"><a href=\"#cite_note-PreethEval15-21\" rel=\"external_link\">[21]<\/a><\/sup>, these results demonstrate that Docker has a minimal overhead for these components.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Schulz_JofPathInformatics2016_7.jpg\" class=\"image wiki-link\" target=\"_blank\" data-key=\"89f8305352e41d90503823f13b2f5e8d\"><img alt=\"Fig3 Schulz JofPathInformatics2016 7.jpg\" src=\"https:\/\/www.limswiki.org\/images\/4\/41\/Fig3_Schulz_JofPathInformatics2016_7.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 3.<\/b> Disk throughput and processor efficiency of Docker containers. (a) The time needed to write a one-gigabyte file with the dd utility was similar in both a virtual machine and within a Docker container on the same host. (b) The calculation of 10,000 primes with the sysbench utility showed similar performance in a virtual machine and a Docker container on the same host.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Since we found similar benchmarking results for both the virtual machine and Docker containers, we next executed our workflow to analyze the AML WGS results from TCGA. Using this approach, we were able to stagger data download and processing to take advantage of all system resources [Figure 4]. One limitation to this approach is that none of the tools described here provide built-in resource monitoring. This pipeline was well-suited to parallel analysis since each application had specific, isolated resource needs such as network, storage, or compute capacity, which could be monitored with custom code within the Python workflow. Use of this automated staggered workflow with Docker containers allowed use to analyze fifty specimens from the TCGA data set within approximately 3 days. It is difficult to provide statistical performance metrics since many factors such as data volume, and network bandwidth can significantly alter the overall pipeline performance. However, the general approach of staggered, parallel computation should provide increased processing efficiency for workloads such as this.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig4_Schulz_JofPathInformatics2016_7.jpg\" class=\"image wiki-link\" target=\"_blank\" data-key=\"f6bdb240cd1e55d298a6f72815bc7a79\"><img alt=\"Fig4 Schulz JofPathInformatics2016 7.jpg\" src=\"https:\/\/www.limswiki.org\/images\/a\/a8\/Fig4_Schulz_JofPathInformatics2016_7.jpg\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 4.<\/b> Illustration of parallelization improvements with a workflow-driven container architecture. (a) When performed serially, the download (white bars) and analysis (shaded bars) of a single pair of tumor and germline sequence on local hardware took approximately four hours (bars drawn to scale). (b) When parallelized with a workflow manager and Docker containers, multiple specimens could be processed simultaneously to take advantage of all system resources, including network, memory, and processor capacity.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Conclusion\">Conclusion<\/span><\/h2>\n<p>he complex nature of genomic data and the tools used to analyze these data sets makes efficient processing difficult with standard environments. As noted above, the use of emerging technologies such as Docker in combination with automated workflows may significantly improve the efficiency of data processing in bioinformatics. With the growing number of open data projects, use of these techniques will be necessary to take advantage of available computational resources.\n<\/p><p>While performance and pipeline efficiency were key components of this implementation, Docker containers also allow for application isolation from the host operating system. Since many bioinformatics tools have complex sets of dependencies and are difficult to build from source, the ability to deploy containers with different operating systems and dependency versions to the same host decreases the amount of effort needed to being analysis. For example, the cgDownload utility is distributed as a compiled binary for use on CentOS 6.7, but can only be deployed on CentOS 7 when built from source, which requires a significant amount of manual configuration. As shown in [Figure 2], the use of containers allowed the deployment of each utility on its natively supported operating system, which improves stability and decreases the potential for dependency conflicts among software applications.\n<\/p><p>Several other tools exist for the orchestration of containerized applications, such as Kubernetes and Docker Swarm. For complex platforms, these tools can be used to deploy containers across hardware clusters and to integrate networking and storage resources between containers. However, these applications work strictly at the container level and do not inherently provide application-level workflows as presented here. Additional implementation experience about the use of these tools within high-performance clusters may provide valuable insights about the scalability of these tools within bioinformatics workflows.\n<\/p><p>The above findings demonstrate the promise of emerging technologies to improve the efficiency of genomic analysis. Because of the subsequent increase in analysis throughput, use of these tools means that big data analyses can be done even with limited local computational capacity. Finally, use of container technology can improve pipeline and experimental reproducibility since preconfigured applications can be readily deployed to nearly any host system. While many factors can impact reproducibility, the use of containers limits variability due to differences in software environment or application configuration when appropriately deployed. The continued use of emerging technology and novel approaches to software architecture has the potential to increase the efficiency of computational analysis in bioinformatics.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Financial_support_and_sponsorship\">Financial support and sponsorship<\/span><\/h2>\n<p>ACLPS Paul E. Strandjord Young Investigator Grant.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conflicts_of_interest\">Conflicts of interest<\/span><\/h2>\n<p>There are no conflicts of interest.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-KrumholzTheYale16-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KrumholzTheYale16_1-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Krumholz, H.M.; Waldstreicher, J. (2016). \"The Yale Open Data Access (YODA) Project--A mechanism for data sharing\". <i>New England Journal of Medicine<\/i> <b>375<\/b> (5): 403\u2013405. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1056%2FNEJMp1607342\" target=\"_blank\">10.1056\/NEJMp1607342<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/27518657\" target=\"_blank\">27518657<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Yale+Open+Data+Access+%28YODA%29+Project--A+mechanism+for+data+sharing&rft.jtitle=New+England+Journal+of+Medicine&rft.aulast=Krumholz%2C+H.M.%3B+Waldstreicher%2C+J.&rft.au=Krumholz%2C+H.M.%3B+Waldstreicher%2C+J.&rft.date=2016&rft.volume=375&rft.issue=5&rft.pages=403%E2%80%93405&rft_id=info:doi\/10.1056%2FNEJMp1607342&rft_id=info:pmid\/27518657&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CollinsMapping07-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CollinsMapping07_2-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Collins, F.S.; Barker, A.D. (2007). \"Mapping the cancer genome: Pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies\". <i>Scientific American<\/i> <b>296<\/b> (3): 50\u20137. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/17348159\" target=\"_blank\">17348159<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Mapping+the+cancer+genome%3A+Pinpointing+the+genes+involved+in+cancer+will+help+chart+a+new+course+across+the+complex+landscape+of+human+malignancies&rft.jtitle=Scientific+American&rft.aulast=Collins%2C+F.S.%3B+Barker%2C+A.D.&rft.au=Collins%2C+F.S.%3B+Barker%2C+A.D.&rft.date=2007&rft.volume=296&rft.issue=3&rft.pages=50%E2%80%937&rft_id=info:pmid\/17348159&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-FanChallenges14-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-FanChallenges14_3-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Fan, J.; Han, F.; Liu, H. (2014). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4236847\" target=\"_blank\">\"Challenges of big data analysis\"<\/a>. <i>National Science Review<\/i> <b>1<\/b> (2): 293\u2013314. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1093%2Fnsr%2Fnwt032\" target=\"_blank\">10.1093\/nsr\/nwt032<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4236847\/\" target=\"_blank\">PMC4236847<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/25419469\" target=\"_blank\">25419469<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4236847\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4236847<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Challenges+of+big+data+analysis&rft.jtitle=National+Science+Review&rft.aulast=Fan%2C+J.%3B+Han%2C+F.%3B+Liu%2C+H.&rft.au=Fan%2C+J.%3B+Han%2C+F.%3B+Liu%2C+H.&rft.date=2014&rft.volume=1&rft.issue=2&rft.pages=293%E2%80%93314&rft_id=info:doi\/10.1093%2Fnsr%2Fnwt032&rft_id=info:pmc\/PMC4236847&rft_id=info:pmid\/25419469&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4236847&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NekrutenkoNext12-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NekrutenkoNext12_4-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Nekrutenko, A.; Taylor, J. (2012). \"Next-generation sequencing data interpretation: Enhancing reproducibility and accessibility\". <i>Nature Reviews Genetics<\/i> <b>13<\/b> (9): 667\u201372. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnrg3305\" target=\"_blank\">10.1038\/nrg3305<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22898652\" target=\"_blank\">22898652<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Next-generation+sequencing+data+interpretation%3A+Enhancing+reproducibility+and+accessibility&rft.jtitle=Nature+Reviews+Genetics&rft.aulast=Nekrutenko%2C+A.%3B+Taylor%2C+J.&rft.au=Nekrutenko%2C+A.%3B+Taylor%2C+J.&rft.date=2012&rft.volume=13&rft.issue=9&rft.pages=667%E2%80%9372&rft_id=info:doi\/10.1038%2Fnrg3305&rft_id=info:pmid\/22898652&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BlankenbergGalaxy10-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BlankenbergGalaxy10_5-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Blankenberg, Daniel; Von Kuster, Gregory; Coraor, Nathaniel; Ananda, Guruprasad; Lazarus, Ross; Mangan, Mary; Nekrutenko, Anton; Taylor, James (2010). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4264107\" target=\"_blank\">\"Galaxy: a web-based genome analysis tool for experimentalists\"<\/a>. <i>Current Protocols in Molecular Biology<\/i> <b>19<\/b> (Unit 19.10.1\u201321). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1002%2F0471142727.mb1910s89\" target=\"_blank\">10.1002\/0471142727.mb1910s89<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4264107\/\" target=\"_blank\">PMC4264107<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/20069535\" target=\"_blank\">20069535<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4264107\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4264107<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Galaxy%3A+a+web-based+genome+analysis+tool+for+experimentalists&rft.jtitle=Current+Protocols+in+Molecular+Biology&rft.aulast=Blankenberg%2C+Daniel%3B+Von+Kuster%2C+Gregory%3B+Coraor%2C+Nathaniel%3B+Ananda%2C+Guruprasad%3B+Lazarus%2C+Ross%3B+Mangan%2C+Mary%3B+Nekrutenko%2C+Anton%3B+Taylor%2C+James&rft.au=Blankenberg%2C+Daniel%3B+Von+Kuster%2C+Gregory%3B+Coraor%2C+Nathaniel%3B+Ananda%2C+Guruprasad%3B+Lazarus%2C+Ross%3B+Mangan%2C+Mary%3B+Nekrutenko%2C+Anton%3B+Taylor%2C+James&rft.date=2010&rft.volume=19&rft.issue=Unit+19.10.1%E2%80%9321&rft_id=info:doi\/10.1002%2F0471142727.mb1910s89&rft_id=info:pmc\/PMC4264107&rft_id=info:pmid\/20069535&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4264107&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HatekeyamaSUSHI16-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HatekeyamaSUSHI16_6-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Hatakeyama, M.; Opitz, L.; Russo, G.; Qi, W.; Schlapbach, R.; Rehrauer, H. (2016). \"SUSHI: An exquisite recipe for fully documented, reproducible and reusable NGS data analysis\". <i>BMC Bioinformatics<\/i> <b>17<\/b>: 228. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1186%2Fs12859-016-1104-8\" target=\"_blank\">10.1186\/s12859-016-1104-8<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SUSHI%3A+An+exquisite+recipe+for+fully+documented%2C+reproducible+and+reusable+NGS+data+analysis&rft.jtitle=BMC+Bioinformatics&rft.aulast=Hatakeyama%2C+M.%3B+Opitz%2C+L.%3B+Russo%2C+G.%3B+Qi%2C+W.%3B+Schlapbach%2C+R.%3B+Rehrauer%2C+H.&rft.au=Hatakeyama%2C+M.%3B+Opitz%2C+L.%3B+Russo%2C+G.%3B+Qi%2C+W.%3B+Schlapbach%2C+R.%3B+Rehrauer%2C+H.&rft.date=2016&rft.volume=17&rft.pages=228&rft_id=info:doi\/10.1186%2Fs12859-016-1104-8&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DudleyInSilico10-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DudleyInSilico10_7-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Dudley, J.T.; Butte, A.J. (2010). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3755123\" target=\"_blank\">\"In silico research in the era of cloud computing\"<\/a>. <i>Nature Biotechnology<\/i> <b>28<\/b> (11): 1181\u20135. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnbt1110-1181\" target=\"_blank\">10.1038\/nbt1110-1181<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3755123\/\" target=\"_blank\">PMC3755123<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21057489\" target=\"_blank\">21057489<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3755123\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3755123<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=In+silico+research+in+the+era+of+cloud+computing&rft.jtitle=Nature+Biotechnology&rft.aulast=Dudley%2C+J.T.%3B+Butte%2C+A.J.&rft.au=Dudley%2C+J.T.%3B+Butte%2C+A.J.&rft.date=2010&rft.volume=28&rft.issue=11&rft.pages=1181%E2%80%935&rft_id=info:doi\/10.1038%2Fnbt1110-1181&rft_id=info:pmc\/PMC3755123&rft_id=info:pmid\/21057489&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3755123&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HoweVirtual12-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HoweVirtual12_8-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Howe, B. (2012). \"Virtual appliances, cloud computing, and reproducible research\". <i>Computing in Science & Engineering<\/i> <b>14<\/b> (4): 36\u201341. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1109%2FMCSE.2012.62\" target=\"_blank\">10.1109\/MCSE.2012.62<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Virtual+appliances%2C+cloud+computing%2C+and+reproducible+research&rft.jtitle=Computing+in+Science+%26+Engineering&rft.aulast=Howe%2C+B.&rft.au=Howe%2C+B.&rft.date=2012&rft.volume=14&rft.issue=4&rft.pages=36%E2%80%9341&rft_id=info:doi\/10.1109%2FMCSE.2012.62&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DockerHome-9\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-DockerHome_9-0\" rel=\"external_link\">9.0<\/a><\/sup> <sup><a href=\"#cite_ref-DockerHome_9-1\" rel=\"external_link\">9.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.docker.com\/\" target=\"_blank\">\"Docker\"<\/a>. Docker, Inc<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.docker.com\/\" target=\"_blank\">https:\/\/www.docker.com\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 21 November 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Docker&rft.atitle=&rft.pub=Docker%2C+Inc&rft_id=https%3A%2F%2Fwww.docker.com%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-AndersonDocker15-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-AndersonDocker15_10-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Anderson, C. (2015). \"Docker\". <i>IEEE Software<\/i> <b>32<\/b> (3): 102\u2013105. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1109%2FMS.2015.62\" target=\"_blank\">10.1109\/MS.2015.62<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Docker&rft.jtitle=IEEE+Software&rft.aulast=Anderson%2C+C.&rft.au=Anderson%2C+C.&rft.date=2015&rft.volume=32&rft.issue=3&rft.pages=102%E2%80%93105&rft_id=info:doi\/10.1109%2FMS.2015.62&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BoettigerAnIntro15-11\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BoettigerAnIntro15_11-0\" rel=\"external_link\">11.0<\/a><\/sup> <sup><a href=\"#cite_ref-BoettigerAnIntro15_11-1\" rel=\"external_link\">11.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Boettiger, C. (2015). \"An introduction to Docker for reproducible research\". <i>SIGOPS Operating Systems Review<\/i> <b>49<\/b> (1): 71\u201379. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1145%2F2723872.2723882\" target=\"_blank\">10.1145\/2723872.2723882<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+introduction+to+Docker+for+reproducible+research&rft.jtitle=SIGOPS+Operating+Systems+Review&rft.aulast=Boettiger%2C+C.&rft.au=Boettiger%2C+C.&rft.date=2015&rft.volume=49&rft.issue=1&rft.pages=71%E2%80%9379&rft_id=info:doi\/10.1145%2F2723872.2723882&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HungGUIdock16-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HungGUIdock16_12-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Hung, L.H.; Kristiyanto, D.; Lee, S.B.; Yeung, K.Y. (2016). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4821530\" target=\"_blank\">\"GUIdock: Using Docker containers with a common graphics user interface to address the reproducibility of research\"<\/a>. <i>PLoS One<\/i> <b>11<\/b> (4): e0152686. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pone.0152686\" target=\"_blank\">10.1371\/journal.pone.0152686<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4821530\/\" target=\"_blank\">PMC4821530<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/27045593\" target=\"_blank\">27045593<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4821530\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4821530<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GUIdock%3A+Using+Docker+containers+with+a+common+graphics+user+interface+to+address+the+reproducibility+of+research&rft.jtitle=PLoS+One&rft.aulast=Hung%2C+L.H.%3B+Kristiyanto%2C+D.%3B+Lee%2C+S.B.%3B+Yeung%2C+K.Y.&rft.au=Hung%2C+L.H.%3B+Kristiyanto%2C+D.%3B+Lee%2C+S.B.%3B+Yeung%2C+K.Y.&rft.date=2016&rft.volume=11&rft.issue=4&rft.pages=e0152686&rft_id=info:doi\/10.1371%2Fjournal.pone.0152686&rft_id=info:pmc\/PMC4821530&rft_id=info:pmid\/27045593&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4821530&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MoreewsBioShaDock15-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MoreewsBioShaDock15_13-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Moreews, F.; Sallou, O.; M\u00e9nager, H. et al. (2015). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4743153\" target=\"_blank\">\"BioShaDock: A community driven bioinformatics shared Docker-based tools registry\"<\/a>. <i>F1000Research<\/i> <b>4<\/b>: 1443. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.12688%2Ff1000research.7536.1\" target=\"_blank\">10.12688\/f1000research.7536.1<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4743153\/\" target=\"_blank\">PMC4743153<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26913191\" target=\"_blank\">26913191<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4743153\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4743153<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BioShaDock%3A+A+community+driven+bioinformatics+shared+Docker-based+tools+registry&rft.jtitle=F1000Research&rft.aulast=Moreews%2C+F.%3B+Sallou%2C+O.%3B+M%C3%A9nager%2C+H.+et+al.&rft.au=Moreews%2C+F.%3B+Sallou%2C+O.%3B+M%C3%A9nager%2C+H.+et+al.&rft.date=2015&rft.volume=4&rft.pages=1443&rft_id=info:doi\/10.12688%2Ff1000research.7536.1&rft_id=info:pmc\/PMC4743153&rft_id=info:pmid\/26913191&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4743153&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CGARNGenomic13-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CGARNGenomic13_14-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Cancer Genome Atlas Research Network (2013). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3767041\" target=\"_blank\">\"Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia\"<\/a>. <i>New England Journal of Medicine<\/i> <b>368<\/b> (22): 2059-74. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1056%2FNEJMoa1301689\" target=\"_blank\">10.1056\/NEJMoa1301689<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3767041\/\" target=\"_blank\">PMC3767041<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23634996\" target=\"_blank\">23634996<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3767041\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3767041<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Genomic+and+epigenomic+landscapes+of+adult+de+novo+acute+myeloid+leukemia&rft.jtitle=New+England+Journal+of+Medicine&rft.aulast=Cancer+Genome+Atlas+Research+Network&rft.au=Cancer+Genome+Atlas+Research+Network&rft.date=2013&rft.volume=368&rft.issue=22&rft.pages=2059-74&rft_id=info:doi\/10.1056%2FNEJMoa1301689&rft_id=info:pmc\/PMC3767041&rft_id=info:pmid\/23634996&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3767041&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PiccoloTools16-15\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-PiccoloTools16_15-0\" rel=\"external_link\">15.0<\/a><\/sup> <sup><a href=\"#cite_ref-PiccoloTools16_15-1\" rel=\"external_link\">15.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Piccolo, S.R.; Frampton, M.B. (2016). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4940747\" target=\"_blank\">\"Tools and techniques for computational reproducibility\"<\/a>. <i>GigaScience<\/i> <b>5<\/b> (1): 30. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1186%2Fs13742-016-0135-4\" target=\"_blank\">10.1186\/s13742-016-0135-4<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4940747\/\" target=\"_blank\">PMC4940747<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/27401684\" target=\"_blank\">27401684<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4940747\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4940747<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Tools+and+techniques+for+computational+reproducibility&rft.jtitle=GigaScience&rft.aulast=Piccolo%2C+S.R.%3B+Frampton%2C+M.B.&rft.au=Piccolo%2C+S.R.%3B+Frampton%2C+M.B.&rft.date=2016&rft.volume=5&rft.issue=1&rft.pages=30&rft_id=info:doi\/10.1186%2Fs13742-016-0135-4&rft_id=info:pmc\/PMC4940747&rft_id=info:pmid\/27401684&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4940747&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SpotifyLuigi-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SpotifyLuigi_16-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/github.com\/spotify\/luigi\" target=\"_blank\">\"Spotify\/Luigi\"<\/a>. GitHub, Inc<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/github.com\/spotify\/luigi\" target=\"_blank\">https:\/\/github.com\/spotify\/luigi<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 21 November 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Spotify%2FLuigi&rft.atitle=&rft.pub=GitHub%2C+Inc&rft_id=https%3A%2F%2Fgithub.com%2Fspotify%2Fluigi&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NIHGDC-17\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NIHGDC_17-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/gdc.cancer.gov\/\" target=\"_blank\">\"Genomic Data Commons\"<\/a>. National Cancer Institute<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/gdc.cancer.gov\/\" target=\"_blank\">https:\/\/gdc.cancer.gov\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 21 November 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Genomic+Data+Commons&rft.atitle=&rft.pub=National+Cancer+Institute&rft_id=https%3A%2F%2Fgdc.cancer.gov%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LarsonSomatic12-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LarsonSomatic12_18-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Larson, D.E.; Harris, C.C.; Chen, K. et al. (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3268238\" target=\"_blank\">\"SomaticSniper: Identification of somatic point mutations in whole genome sequencing data\"<\/a>. <i>Bioinformatics<\/i> <b>28<\/b> (3): 311-7. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1093%2Fbioinformatics%2Fbtr665\" target=\"_blank\">10.1093\/bioinformatics\/btr665<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3268238\/\" target=\"_blank\">PMC3268238<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22155872\" target=\"_blank\">22155872<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3268238\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3268238<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SomaticSniper%3A+Identification+of+somatic+point+mutations+in+whole+genome+sequencing+data&rft.jtitle=Bioinformatics&rft.aulast=Larson%2C+D.E.%3B+Harris%2C+C.C.%3B+Chen%2C+K.+et+al.&rft.au=Larson%2C+D.E.%3B+Harris%2C+C.C.%3B+Chen%2C+K.+et+al.&rft.date=2012&rft.volume=28&rft.issue=3&rft.pages=311-7&rft_id=info:doi\/10.1093%2Fbioinformatics%2Fbtr665&rft_id=info:pmc\/PMC3268238&rft_id=info:pmid\/22155872&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3268238&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MillerSciClone14-19\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MillerSciClone14_19-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Miller, C.A.; White, B.S.; Dees, N.D. et al. (2014). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4125065\" target=\"_blank\">\"SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution\"<\/a>. <i>PLoS Computational Biology<\/i> <b>10<\/b> (8): e1003665. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.1003665\" target=\"_blank\">10.1371\/journal.pcbi.1003665<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4125065\/\" target=\"_blank\">PMC4125065<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/25102416\" target=\"_blank\">25102416<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4125065\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4125065<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SciClone%3A+inferring+clonal+architecture+and+tracking+the+spatial+and+temporal+patterns+of+tumor+evolution&rft.jtitle=PLoS+Computational+Biology&rft.aulast=Miller%2C+C.A.%3B+White%2C+B.S.%3B+Dees%2C+N.D.+et+al.&rft.au=Miller%2C+C.A.%3B+White%2C+B.S.%3B+Dees%2C+N.D.+et+al.&rft.date=2014&rft.volume=10&rft.issue=8&rft.pages=e1003665&rft_id=info:doi\/10.1371%2Fjournal.pcbi.1003665&rft_id=info:pmc\/PMC4125065&rft_id=info:pmid\/25102416&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4125065&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SciLuigi-20\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SciLuigi_20-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/github.com\/pharmbio\/sciluigi\" target=\"_blank\">\"Pharmbio\/SciLuigi\"<\/a>. GitHub, Inc<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/github.com\/pharmbio\/sciluigi\" target=\"_blank\">https:\/\/github.com\/pharmbio\/sciluigi<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 21 November 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Pharmbio%2FSciLuigi&rft.atitle=&rft.pub=GitHub%2C+Inc&rft_id=https%3A%2F%2Fgithub.com%2Fpharmbio%2Fsciluigi&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PreethEval15-21\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PreethEval15_21-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Preeth, E.N.; Mulerickal, F.J.; Paul, B.; Sastri, Y. (2015). \"Evaluation of Docker containers based on hardware utilization\". <i>Proceedings of the 2015 International Conference on Control Communication and Computing India<\/i> <b>2015<\/b>: 697\u2013700. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1109%2FICCC.2015.7432984\" target=\"_blank\">10.1109\/ICCC.2015.7432984<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluation+of+Docker+containers+based+on+hardware+utilization&rft.jtitle=Proceedings+of+the+2015+International+Conference+on+Control+Communication+and+Computing+India&rft.aulast=Preeth%2C+E.N.%3B+Mulerickal%2C+F.J.%3B+Paul%2C+B.%3B+Sastri%2C+Y.&rft.au=Preeth%2C+E.N.%3B+Mulerickal%2C+F.J.%3B+Paul%2C+B.%3B+Sastri%2C+Y.&rft.date=2015&rft.volume=2015&rft.pages=697%E2%80%93700&rft_id=info:doi\/10.1109%2FICCC.2015.7432984&rfr_id=info:sid\/en.wikipedia.org:Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20181214191103\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.543 seconds\nReal time usage: 0.577 seconds\nPreprocessor visited node count: 17471\/1000000\nPreprocessor generated node count: 32299\/1000000\nPost\u2010expand include size: 138182\/2097152 bytes\nTemplate argument size: 42964\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 543.659 1 - -total\n 81.01% 440.417 1 - Template:Reflist\n 70.61% 383.885 21 - Template:Citation\/core\n 64.14% 348.692 17 - Template:Cite_journal\n 13.83% 75.186 1 - Template:Infobox_journal_article\n 13.30% 72.303 1 - Template:Infobox\n 10.69% 58.091 4 - Template:Cite_web\n 9.66% 52.523 37 - Template:Citation\/identifier\n 7.44% 40.421 80 - Template:Infobox\/row\n 3.92% 21.306 21 - Template:Citation\/make_link\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:9926-0!*!0!!en!5!* and timestamp 20181214191102 and revision id 29159\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis\">https:\/\/www.limswiki.org\/index.php\/Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","18474356308b22be86d3205a31b5a267_images":["https:\/\/www.limswiki.org\/images\/f\/f1\/Fig1_Schulz_JofPathInformatics2016_7.jpg","https:\/\/www.limswiki.org\/images\/1\/17\/Fig2_Schulz_JofPathInformatics2016_7.jpg","https:\/\/www.limswiki.org\/images\/4\/41\/Fig3_Schulz_JofPathInformatics2016_7.jpg","https:\/\/www.limswiki.org\/images\/a\/a8\/Fig4_Schulz_JofPathInformatics2016_7.jpg"],"18474356308b22be86d3205a31b5a267_timestamp":1544814662,"5321dee46dc24114d97002f69139f201_type":"article","5321dee46dc24114d97002f69139f201_title":"DGW: An exploratory data analysis tool for clustering and visualisation of epigenomic marks (Lukauskas et al. 2016)","5321dee46dc24114d97002f69139f201_url":"https:\/\/www.limswiki.org\/index.php\/Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks","5321dee46dc24114d97002f69139f201_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:DGW: An exploratory data analysis tool for clustering and visualisation of epigenomic marks\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nDGW: An exploratory data analysis tool for clustering and visualisation of epigenomic marksJournal\n \nBMC BioinformaticsAuthor(s)\n \nLukauskas, Saulius; Visintainer, Roberto; Sanguinetti, Guido; Schweikert, Gabriele B.Author affiliation(s)\n \nImperial College London, Fondazione Bruno Kessler, University of EdinburghPrimary contact\n \nEmail: saulius dot lukauskas13 at imperial dot ac dot ukYear published\n \n2016Volume and issue\n \n17(Suppl 16)Page(s)\n \n447DOI\n \n10.1186\/s12859-016-1306-0ISSN\n \n1471-2105Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttp:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-016-1306-0Download\n \nhttp:\/\/bmcbioinformatics.biomedcentral.com\/track\/pdf\/10.1186\/s12859-016-1306-0 (PDF)\n\n\n\n\n \n This article contains rendered mathematical formulae. You may require the Math Anywhere plugin for Chrome or the Native MathML add-on and fonts for Firefox if they don't render properly for you. \n\n\nContents\n\n1 Abstract \n2 Background \n3 Methods \n\n3.1 Dynamic genome warping \n3.2 Clustering \n3.3 Pre-processing pipeline and implementation \n\n\n4 Results and discussion \n\n4.1 Simulation study \n4.2 DGW automatically aligns genomic landmarks \n4.3 DGW clusters are enriched for co-factor binding sites \n\n\n5 Conclusion \n6 Declarations \n\n6.1 Acknowledgements \n\n6.1.1 Declarations \n6.1.2 Funding \n6.1.3 Availability of data and materials \n6.1.4 Authors\u2019 contributions \n6.1.5 Competing interests \n6.1.6 Consent for publication \n6.1.7 Ethics approval and consent to participate \n\n\n\n\n7 References \n8 Notes \n\n\n\nAbstract \nBackground: Functional genomic and epigenomic research relies fundamentally on sequencing-based methods like ChIP-seq for the detection of DNA-protein interactions. These techniques return large, high-dimensional data sets with visually complex structures, such as multi-modal peaks extended over large genomic regions. Current tools for visualisation and data exploration represent and leverage these complex features only to a limited extent.\nResults: We present DGW (Dynamic Gene Warping), an open-source software package for simultaneous alignment and clustering of multiple epigenomic marks. DGW uses dynamic time warping to adaptively rescale and align genomic distances which allows to group regions of interest with similar shapes, thereby capturing the structure of epigenomic marks. We demonstrate the effectiveness of the approach in a simulation study and on a real epigenomic data set from the ENCODE project.\nConclusions: Our results show that DGW automatically recognises and aligns important genomic features such as transcription start sites and splicing sites from histone marks. DGW is available as an open-source Python package.\nKeywords: Clustering, ChIP-seq, epigenetics, dynamic time warping\n\nBackground \nSequencing-based technologies such as ChIP-Seq and DNAse-Seq [e.g., reviewed in Furey 2012[1]] have revolutionized our understanding of chromatin structure and function, yielding deep insights in the importance of epigenomic marks in the basic processes of life. The emergent picture is that gene expression is controlled by a complex interplay of protein binding and epigenomic modifications. While histone marks (and other epigenomic marks) can be measured in a high-throughput way, exploratory data analysis techniques for these data types are still being developed. Epigenomic marks exhibit characteristics that distinguish them fundamentally from e.g., mRNA gene expression measurements: they are spatially extended across regions as wide as several kilobases within which they often present interesting local structures, such as the presence of multiple peaks and troughs[2], and intriguing asymmetries[3] (see Fig. 1). \n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. The epigenomic marks H3K4me3 (left) and H3K9ac (right) accumulate around transcription start sites often showing a bimodal peak with a valley over the TSS. Shown are two biological replicates for each mark and the input signal. Y axis corresponds to read counts. Annotated genes and the enriched regions called by MACS2 are shown in grey below each profile.\n\n\n\nThe shape of epigenomic marks across replicate data sets appears to be highly conserved, and has recently been exploited for statistical testing.[4] While the biological reasons for such conservation are not entirely clear, recent studies have suggested that both architectural and regulatory aspects may be at play. Bieberstein and colleagues showed intriguing patterns of accumulation of the histone marks H3K4me3 and H3K9ac at splice sites[5], hinting at an architectural origin of the shape of the marks. More recently, Benveniste et al. showed that histone marks can be very well predicted genome-wide by the binding patterns of transcription factors (TFs).[6] The shape of the peak may therefore be a readout of additional chromatin-related events and genomic regions which are similarly marked may therefore hint at common regulatory or architectural features. Excellent visualisation tools (e.g., UCSC genome browser) enable researchers to appreciate such features for individual enrichment peaks. However, while automatically grouping such marks based on shape similarity may be a valuable tool for hypothesis generation, it has remained a non-trivial task.\nCurrent approaches to clustering regions based on chromatin signatures can be broadly split into two camps: global approaches, such as the celebrated HMM-based reconstruction of the \"colours of the chromatin,\"[7] try to find a segmentation of large genomic regions based on histone signatures. These approaches usually rely on the \"presence vs. absence\" characterization of histone marks at genomic loci, such that the clustering is primarily based on combinatorial patterns of multiple histone marks, as opposed to spatial patterns emerging within individual peaks. \nAnother interesting segmentation approach was recently introduced by Knijnenburg and colleagues.[8] Here, signal enrichment is considered across a wide range of scales spanning several orders of magnitude. While this constitutes a significant improvement compared to earlier approaches, signal patterns within segments are again not taken into account. On the other hand, local approaches attempt to cluster short genomic regions at particular loci based on the quantitative binding or modification pattern measured at the loci (e.g., via ChIP-Seq). Examples of these approaches include the ENCODE Cluster Aggregation Tool (CAGT)[3], or the clustering of genes based on PolII binding profiles.[9]\nLocal approaches have to address two challenging problems: aligning the peaks to a reference, and standardising the peaks so that they can be represented as vectors of equal dimensions. To align regions, both the method by Taslim and colleagues as well as the CAGT tool rely on anchor points (e.g., transcription start sites (TSS)[9] or transcription factor binding sites from auxiliary ChIP-Seq experiments[3]). The regions are then standardised either by rescaling to a fixed gene length[9] or by applying windows of fixed length either side of the anchor points[3] irrespective of the true extent of the local enrichment. These strategies may be plausible for certain applications. However, the shape and extent of histone marks for instance, appear to be determined by many factors[5], such that a uniform rescaling may be inappropriate. In particular, if one made the assumption that epigenomic marks are directly or indirectly influenced by the underlying DNA sequence, it becomes clear that more flexibility in the comparison and alignment of these marks is needed. For example, ortholog genes may share similar sequence features but their sequence length may vary. Sequence comparisons therefore in general do not require the considered sequences to be of equal length; they allow for insertions, deletions, shifts. Similar local variations should therefore be allowed when comparing epigenomic marks.\nIn this work, we address the problem of aligning and clustering epigenomic data in a completely unsupervised way: as input data we use ChIP-Seq enrichment measurements within peak regions identified by a peak finder such as MACS.[10] The alignment and the standardisation problems are solved simultaneously without the use of additional information, such as transcription start sites or gene annotation. We introduce a local rescaling which allows to match epigenomic marks based on maximum similarity between shapes. Our method\/software, Dynamic Genome Warping (DGW), is based on the classical dynamic time warping algorithm[11][12], which enabled computer scientists to construct robust speech recognisers undeterred by the variability in pitch and speed of enunciation. In DGW we have implemented multidimensional alignment and clustering, such that multiple epigenomic tracks can be analysed simultaneously. This feature can also be used to control for local sequencing bias as DNA inputs or IGG controls can easily be added to the analysis. We first test DGW in a simulation study. Subsequently, we demonstrate that DGW can align genomic landmarks such as TSSs and first splicing sites (FSSs) on real epigenomic data from the ENCODE project[13], thus effectively and automatically solving both the alignment and the standardization problems. DGW is freely available as a stand-alone, platform-independent and fully documented Python package.\n\nMethods \nWe will first motivate and illustrate our method on a particular data set of histone modifications from the ENCODE project[13], measuring tri-methylation of histone 3 at lysine 4 (H3K4me3) and acetylation of histone 3 at lysine 9 (H3K9ac) in human leukaemia cell line K562. The reason for choosing these two specific marks is that they are known to be characteristically enriched in the flanking regions of TSSs[2] and they were recently shown to accumulate at FSSs[5], hence providing direct evidence of the biological relevance of both the alignment and standardisation problems.\nAligned fragments (BAM files) of both epigenomic marks were processed with the MACS2 peak caller[10] to identify regions which showed enrichment relative to a input control sample; we then merged the two sets by considering every region called for either mark. We stress that the method is independent of the specific marks chosen, or the choice of peak caller, and is readily extendable to other types of genomic and epigenomic data.\nEnriched regions normally have very different lengths; nevertheless, visual inspection of peaks can reveal similarities between the shape of the peaks. These similarities are often visualised through a global averaging (aggregation) of the marks[2]; nevertheless, there are strong arguments that global averaging may also mask more subtle patterns. A useful motivating example is given in Fig. 1, which shows four regions which are enriched in the H3K4me3 as well as H3K9ac marks. They all overlap with genes and exhibit broadly similar shapes: a bimodal peak with a trough over the TSS. However, the total lengths of the enriched regions vary, and so does the extent of the two individual sub-peaks, which could be governed by the underlying gene structure. Therefore, the position of the TSS relative to the start of the enriched regions varies.\n\nDynamic genome warping \nTo automatically quantify the similarities between peaks such as the ones shown in Fig. 1, we use the classic dynamic time warping (DTW) algorithm.[11][14] A modern review of the basic concepts of dynamic time warping can be found e.g. in M\u00fcller 2007.[12] It was originally introduced in the speech recognition community to robustly recognize speech independently of speech speed. There, the problem was to match waveforms of similar shape but potentially different duration. Likewise, our aim is to be able to associate peaks which exhibit similar local structure (shape) regardless of their spatial extension.\nSpecifically, let a=(a1,\u2026,aN) and b=(b1,\u2026,bM) be two sequences with values \n \n \n \n \n a\n \n i\n \n \n ,\n \n b\n \n i\n \n \n ∈\n \n S\n \n \n \n {\\displaystyle a_{i},b_{i}\\in {S}}\n \n , where \n \n \n \n \n S\n \n \n \n {\\displaystyle {S}}\n \n is a metric space equipped with local distance \n \n \n \n d\n :\n \n S\n \n ×\n \n S\n \n →\n \n R\n \n \n \n {\\displaystyle d\\colon {S}\\times {S}\\rightarrow \\mathbb {R} }\n \n (e.g., squared Euclidean distance or Cosine distance). DTW uses dynamic programming to construct a warping path \n \n \n \n \n p\n \n =\n \n (\n \n \n \n p\n \n 1\n \n \n 0\n \n \n \n ,\n \n \n p\n \n 1\n \n \n 1\n \n \n \n \n )\n \n ,\n …\n ,\n \n (\n \n \n \n p\n \n i\n \n \n 0\n \n \n \n ,\n \n \n p\n \n i\n \n \n 1\n \n \n \n \n )\n \n ,\n …\n ,\n \n (\n \n \n \n p\n \n K\n \n \n 0\n \n \n \n ,\n \n \n p\n \n K\n \n \n 1\n \n \n \n \n )\n \n \n \n {\\displaystyle \\mathbf {p} =\\left({p_{1}^{0}},{p_{1}^{1}}\\right),\\ldots ,\\left({p_{i}^{0}},{p_{i}^{1}}\\right),\\ldots ,\\left({p_{K}^{0}},{p_{K}^{1}}\\right)}\n \n , i.e., two sets of indices identifying the elements of the two sequences which are mapped to each other in order to minimise the sum of the local distances. In formulae:\n\n \n \n \n \n p\n \n =\n \n argmin\n \n \n ∑\n \n i\n =\n 1\n \n \n K\n \n \n d\n \n (\n \n \n a\n \n \n p\n \n i\n \n \n 0\n \n \n \n \n ,\n \n b\n \n \n p\n \n i\n \n \n 1\n \n \n \n \n \n )\n \n \n \n {\\displaystyle \\mathbf {p} ={\\text{argmin}}\\sum \\limits _{i=1}^{K}d\\left(a_{p_{i}^{0}},b_{p_{i}^{1}}\\right)}\n \n \nsubject to the follow constraints:\n\n \n \n \n \n \n \n p\n \n 1\n \n \n 0\n \n \n \n =\n \n \n p\n \n 1\n \n \n 1\n \n \n \n =\n 1\n \n \n {\\displaystyle {p_{1}^{0}}={p_{1}^{1}}=1}\n \n , the first points of both sequences are mapped to each other;\n \n \n \n \n \n \n p\n \n K\n \n \n 0\n \n \n \n =\n N\n \n \n {\\displaystyle {p_{K}^{0}}=N}\n \n , the end points of both sequences are mapped to each other;\n \n \n \n \n 0\n ≤\n \n p\n \n i\n +\n 1\n \n \n j\n \n \n −\n \n \n p\n \n i\n \n \n j\n \n \n \n ≤\n 1\n \n \n {\\displaystyle 0\\leq p_{i+1}^{j}-{p_{i}^{j}}\\leq 1}\n \n for all i=1,\u2026,K and j=0,1, each index set is non-decreasing with maximum step one. This ensures that every point in each sequence gets mapped to at least one point on the other sequence.\nAlgorithmically, DTW is very similar to the classical alignment algorithms such as Needleman-Wunsch and Smith-Waterman: it assumes an optimal alignment between subsequences, iterates by selecting the optimal next move and recovers the optimal global alignment by backtracking. As such, it entails constructing a matrix of size M\u00d7N, which determines the computational complexity of the algorithm. Computing pairwise DTW distances between all peaks is therefore the computationally most expensive step, as it involves computing \n \n \n \n O\n \n (\n \n N\n \n p\n e\n a\n k\n s\n \n \n 2\n \n \n )\n \n \n \n {\\displaystyle O\\left(N_{peaks}^{2}\\right)}\n \n DTW distances, each of which is O(M\u00d7N). \nIn Fig. 2 we show how the first two peaks in Fig. 1 are aligned onto each other using DTW. Notice that the pure DTW algorithm allows arbitrarily long stretches to be compressed to a single point. This behaviour may be undesirable, and simple modifications are implemented such as an upper limit on the length of compressed regions (Sakoe-Chiba band[11]), or an exponential penalty on compressing\/stretching. By applying the Sakoe-Chiba Band constraint we can also reduce the run-time to O(k\u00d7m a x(N,M)), where k is the width of the band, that can be chosen to be small. Novel strategies to reduce the computational load are however emerging[15], and it would be interesting to integrate such ideas in the epigenomic context.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 2. DGW alignment of two H3K4me3 profiles. a Shown is the distance density matrix for two peaks (Peak 188 on the x-axis and Peak 280 on the y axis). Colour coding corresponds to local Euclidean distances from small (green) to large (red). Optimal path is shown in blue. b Mapping between the two profiles. c Dynamically aligned profiles and total distance D between the two peaks.\n\n\n\nDGW readily extends to multi-dimensional data if more than one epigenomic track is analysed: In this case a and b become sequences of vectors, e.g., (a1,\u2026,aN), that each contain the coverage of each mark at time point i. In this way, the different epigenomic marks are given equal weight, however other weighting schemes can easily be implemented.\nIn addition to the optimal path between two sequences, we also report their total distance under the optimal warping, which will subsequently be used for the clustering of peaks. Note, when using squared Euclidean distance as local distance measure, both differences in peak shapes as well as in enrichment levels contribute to the overall DTW distance. If this is not desired, the peaks can optionally be normalized by the respective peak heights, and the Cosine distance can be used as local distance. To account for potential strand specificity of epigenomic marks, we compute two distances for every pairwise peak comparison: one with the two sequences unchanged, and one with one of them reversed. The smaller distance between the two is then returned as the true distance between the patterns.\n\nClustering \nAfter aligning all pairwise distances between peaks, we next aim to cluster them into groups which share similar shapes. Implementing k-means clustering within a DTW framework, however, would require the ability to define an average of all potentially possible warped profiles, which is not an easy task. Instead we take advantage of the pre-computed pairwise distances between peaks and perform agglomerative hierarchical clustering, using complete linkage to avoid chaining.[16] The resulting dendrogram contains N peaks \u22121 nodes, each of which represents a possible clustering of the data. As in any hierarchical clustering method, the number of clusters can be adaptively chosen by the user. This is both a strength and a weakness of the methodology. Principled methods for choosing a cutoff exist[17] and implementing them in the context of DGW will be a future direction of improvement. DGW computes a prototype for each node, i.e., a sequence representative of all sequences attached to the node (leaves of the tree which has the chosen node as a root). Prototype computation is a non-trivial problem in DTW; here we use the scaled prioritised shape averaging algorithm of Niennattrakul and Ratanamahatana.[18]\n\nPre-processing pipeline and implementation \nHere we briefly describe the DGW software package; a more thorough description, including installation instructions and examples, is given in the vignette at the DGW homepage.[19] DGW consists of two modules: a worker module, which performs the computationally intensive tasks, and an explorer module, which allows visual exploration of the results. DGW-worker takes as input a set of genomic regions (a bed file e.g., returned by a peak finder) and a set of data files (bam files) for different epigenomic marks. Single-end reads are extended to the estimated fragment lengths. To alleviate the computational burden and to reduce spatial noise, coverage within peak regions are binned into non-overlapping windows spanning 50 bp. This is an adjustable parameter which should reflect the scale at which local changes are expected in the data. For each peak, we thus construct a sequence a=(a1,\u2026,aN), which contains as values ai the coverage within each bin i. \nAt this point, we do not normalize with the input sample but use simple read counts. A practical reason for this is that most input samples still have a relatively low coverage. As there is no enrichment for binding sites, input samples cover the whole genome. Input library sizes therefore need to be significantly larger than their IP sample counterparts, which in practice is rarely the case. A simple correction, which uses enrichment over input, is in most applications counterproductive as it adds additional noise to the signal. However, our method allows to add input samples for multidimensional clustering offering a convenient way to incorporate the additional information which is conveyed in a sufficiently sequenced input sample if it is available. The DGW-worker then computes the warping distances, the hierarchical clustering dendrogram, and the prototype sequences associated with each node; this is computationally intensive and the tasks will be automatically distributed across multiple cores if available. A typical run of DGW-worker on the ChIP-seq data set takes 420 minutes of CPU time distributed across six cores, for a total execution time of just over one hour.\nOnce these computations are completed, the lightweight explorer module can be launched. This opens a window displaying a heat map of the peaks and the clustering dendrogram. The dendrogram can be cut at any desired level. The information about which peaks are clustered is returned as a series of BED files (one per cluster) to enable subsequent analyses. Individual clusters can be further analysed and additional functionalities are provided at this level, e.g., histograms of the positions of specific regions of interest pre- and post-warping (Fig. 6), and warpings of individual peaks onto prototypes can be obtained.\n\nResults and discussion \nSimulation study \nAs a proof of correctness, we constructed a simple simulation study that mimics as best as possible a real biological data set. We considered the initial 2 kb of five genes from the UCSC known genes data set, and extracted H3K4me3 data for these five regions from the ENCODE human leukaemia cell line K562 (Fig. 3). The first three of which showed bi-modal peaks, the remaining two exhibited a single peak. We ensured that the first splicing site of these five genes fell within the 2 kb region considered. We generated modified versions of the five seed regions using the following procedure (Fig. 3): A multiplicative Gaussian noise with variance v was applied to the read counts in each bin of a seed region. Further, each bin was removed or duplicated with probability p producing a shrinkage or a stretch of the peak. Bin duplication was allowed also for duplicated bins resulting in local stretching of varying length. Additionally, the orientation of the simulated peak was switched with probability fp in order to simulate anti-sense transcription. For each set of parameters (v, p and fp) we produced 99 simulated peaks starting from each seed thus obtaining a 500 peak dataset (Fig. 3).\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 3. Generation of simulated data sets: Shown in blue are five seed regions, i.e., original ENCODE H3K4me3 read counts at the start of five known genes. For each of the seed regions we show 10 simulated modifications which are created by multiplying Gaussian noise to each bin (v: 0.1), by introducing insertions and deletions (p: 0.1) and by flipping the orientation of the peak with probability fp: 0.1. Individual panels (a\u2013e) represent the different seed regions.\n\n\n\nThe Clustering results are shown in Fig. 4, both for a standard hierarchical clustering, as well as for DTW clustering. Figure 4a and b show resulting dendrograms for the simulation experiment with parameters (v: 0.25, p: 0.25, fp: 0.1). In contrast to standard hierarchical clustering DGW identifies five clusters with approximately 100 members each, corresponding well to the initial five seed patterns. We reproduced the data simulation and clustering phases varying the parameter sets in order to investigate a grid of increasing modifications of peak patterns. We quantitatively assess the accuracy of the clustering using the Matthews Correlation Coefficient (MCC) with the generalization for multi-class classification problems.[20][21] The results are presented in Fig. 4 and Table 1. The MCC ranges from -1 to 1, the extreme values represent completely incorrect and completely correct classifications, respectively and 0 the result of a random classification. Standard Hierarchical clustering is able to correctly group the simulated peaks according to the pattern they are originally derived from only if the added noise and modifications are small (v<0.15, p<0.15). With DGW optimal clustering can be achieved even if the extent of local modifications to the patterns is large (Fig. 4 and Table 1).\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 4. Simulation results for parameter set (v: 0.25, p: 0.25, fp: 0.1). a Left panel shows the dendrogram of clustered peaks using hierarchical clustering only. Peaks assigned to each of five different cluster are shown in yellow, pink, blue, red and green. X axis represents the pairwise distances d Right panel shows clustered peaks. Colour coding corresponds to normalized read counts. X axis represents original (unwarped) bins from start of the peaks. c Matthews Correlation Coefficient for hierarchical clustering based on a set of simulations with varying parameters (v, p) and fp fixed to 0.1. b and d as a and c but for DGW alignment followed by hierarchical clustering.\n\n\n\n\n\n\n\n\n\nTable 1. Matthews Correlation Coefficient values relative to the classifications of the synthetic peaks produced with the indicated values for p and v and fp=0.1\n\n\nNo DTW\n\n\np\\v\n\n0.00\n\n0.05\n\n0.10\n\n0.15\n\n0.20\n\n0.25\n\n0.30\n\n0.35\n\n0.40\n\n\n0.00\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n0.785\n\n0.699\n\n0.412\n\n\n0.05\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n0.773\n\n0.486\n\n\n0.10\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n0.785\n\n0.795\n\n0.707\n\n0.372\n\n\n0.15\n\n1.000\n\n1.000\n\n0.998\n\n1.000\n\n1.000\n\n1.000\n\n0.723\n\n0.505\n\n0.671\n\n\n0.20\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n0.682\n\n1.000\n\n0.696\n\n0.572\n\n0.501\n\n\n0.25\n\n1.000\n\n1.000\n\n1.000\n\n0.600\n\n1.000\n\n0.748\n\n0.710\n\n0.618\n\n0.452\n\n\n0.30\n\n0.945\n\n0.995\n\n0.998\n\n0.907\n\n0.682\n\n0.995\n\n0.772\n\n0.767\n\n0.434\n\n\n0.35\n\n0.957\n\n0.927\n\n0.975\n\n0.613\n\n0.973\n\n0.604\n\n0.701\n\n0.259\n\n0.413\n\n\n0.40\n\n0.649\n\n0.619\n\n0.990\n\n0.704\n\n0.599\n\n0.746\n\n0.252\n\n0.617\n\n0.294\n\n\nDTW\n\n\np\\v\n\n0.00\n\n0.05\n\n0.10\n\n0.15\n\n0.20\n\n0.25\n\n0.30\n\n0.35\n\n0.40\n\n\n0.00\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n0.995\n\n0.780\n\n0.773\n\n\n0.05\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n0.995\n\n0.772\n\n0.778\n\n\n0.10\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n0.788\n\n0.783\n\n0.766\n\n\n0.15\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n0.998\n\n0.975\n\n0.774\n\n0.978\n\n\n0.20\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n0.998\n\n0.774\n\n0.752\n\n0.760\n\n\n0.25\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n1.000\n\n0.998\n\n0.988\n\n0.973\n\n0.948\n\n\n0.30\n\n1.000\n\n0.990\n\n0.988\n\n1.000\n\n1.000\n\n0.998\n\n0.901\n\n0.421\n\n0.764\n\n\n0.35\n\n1.000\n\n0.971\n\n1.000\n\n0.586\n\n0.978\n\n0.990\n\n0.297\n\n0.318\n\n0.710\n\n\n0.40\n\n1.000\n\n0.954\n\n0.964\n\n0.998\n\n0.988\n\n0.665\n\n0.973\n\n0.934\n\n0.401\n\n\n\nDGW automatically aligns genomic landmarks \nTo assess the biological significance of DGW alignment and clustering, we considered two histone marks (H3K4me3 and H3K9ac) from the ENCODE data sets. These marks were chosen as they were shown to accumulate at transcription start sites as well as first splicing sites (FSSs).[5] Given that first exon length is highly variable, this provides a strong motivation for the local rescaling applied by DGW. For this experiment, enriched regions identified by the MACS2 peak caller were used for clustering such that no anchoring was provided. Using a bin size of 50, we restricted the analysed set of peaks to those that had a length larger than five and smaller than 1000 bins. Also we filtered out peaks with less than 10 counts. We used squared Euclidean distance for the local distance measure between the scaled reads and constrained the DTW with a Sakoe-Chiba Band of width 12.\nFig. 5 shows the dendrogram and heat maps for this data. Notice the high variability in peak length, making it virtually impossible to visually distinguish any patterns. Cutting the dendrogram at an appropriate level is a difficult choice. Empirically, cutting the dendrogram near the leaves gives better visualisations, as larger clusters force the algorithm to warp together potentially very different peaks. With this in mind, we chose a cut which resulted in 45 clusters. Fig. 6a and b show the original and warped heat-maps for the two epigenomic marks within one particular cluster. TSS and First Splice Site positions are shown with red and orange dots, respectively. The heat-map of the warped data shows a well defined bimodal pattern of H3K4me3 with TSS aligning in the valley between the two sub-peaks. This is in good agreement with the known pattern of these marks around gene starts. It can be seen that these genomic landmarks or points of interest (POIs) are approximately aligned, without the usage of any prior knowledge of their position in the clustering. This is corroborated by considering the histograms of TSS and FSS positions in the raw and aligned data (Fig. 6c and d). Computing the change in entropy between the histograms shown in Fig. 6, after rescaling the raw data to have the same length, we observe a decrease of 12.91% for TSS and 7.72% for FSS location distributions in the selected cluster after warping. On average, across all clusters, this effect is less pronounced, but still significant: 1.72% decrease on average (95% Bootstrap confidence interval 0.83% \u223c 2.81%) for TSS and 2.65% (1.79% \u223c 3.63%) for FSS respectively, quantitatively demonstrating the ability of DGW to align these genomic landmarks.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 5. ENCODE data: DGW clustering of the H3K4me3 and H3K9ac marks in the K562 cell line. Shown are Dendrogram and heat-maps. TSS are shown as red dots in the heat-maps.\n\n\n\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 6. ENCODE data, a sample DGW cluster. a heat-maps of the raw and b aligned data. Red dots indicate transcription start sites, orange dots first splice sites. c Histograms of the positions of TSSs in raw (left) and aligned (right) data. d Histograms of the positions of first splice sites in raw (left) and aligned (right) data.\n\n\n\nDGW clusters are enriched for co-factor binding sites \nTo probe further the biological significance of the DGW clusters, we asked whether the cluster membership could be explained in part by considering shared binding co-factors. To test this hypothesis, we considered ChIP-Seq data sets for 34 transcription factors (TFs) assayed by ENCODE in the K562 cell line (see \"Availability of data and materials\" section for lists of TFs and download sources). Several TFs have been mechanistically associated with histone modifying enzymes, and indeed TF binding has recently been reported to be very strongly predictive of histone modifications.[6] We extracted peak information from these data sets, and then questioned the distribution of individual TFs binding sites across clusters. Under a reasonable null hypothesis of no relation between clustering and TF binding, one would expect the number of TF peaks falling into the genomic region corresponding to a cluster to be simply proportional to the size of the genomic region, i.e., a uniform distribution.\nFig. 7 shows normalised cumulative occurrences of TF binding sites across clusters. For each TF, clusters are ranked by their relative overlap with the given TF. Each bar corresponds to the cumulative level of normalized overlap between the TF and the considered cluster plus all clusters to the left of it. The null hypothesis of uniform distribution would correspond to the red line. On the contrary, if all binding sites for a given TF could be found in a single cluster, all bars would have length 0 except for the right most one, which would have length 1. A large area between the red line and the cumulative plot therefore indicates a strongly non-uniform distribution. Occurrence distributions for some TF, such as TR4, ATF3 or NFE2, are remarkably non-uniform and demonstrate that some clusters are highly enriched for a specific set of TFs. While these tests do not yield an immediately interpretable biological outcome, they strongly hint at a biological significance for enriched regions clustered by DGW.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 7. Cumulative levels of normalized overlap between each TF and the determined clusters. Each sub-plot corresponds to one TF. For each TF, clusters are ranked by their relative overlap with this TF. Each bar corresponds to the cumulative level of normalized overlap between the TF and the considered cluster plus all clusters to the left of it. The null hypothesis of uniform distribution corresponds to the red lines. The area between the red line and the cumulative plots is indicated below the TF name.\n\n\n\nConclusion \nData exploration and visualisation tools have played a central role in bioinformatics and have contributed in no small part to the success of high-throughput methods in the last decade.[22] Extending these methodologies for the complex next-generation sequencing data sets poses computational and methodological challenges, yet the potential for hypothesis generation is considerable. ChIP-seq data sets, in particular, yield high-dimensional, structured marks associated with genomic regions. The reproducibility of the spatial structure in the ChIP-seq signal has already inspired the development of shape-based statistical tests for ChIP-seq.[4] In this paper, we addressed the natural question of whether spatial structures in ChIP-seq data can also be used to group genes with similar epigenomic marks. We have proposed a novel method, DGW, which aims to address these problems, using ideas from signal processing and speech recognition. Our results show that DGW can be a practical and user-friendly tool for exploratory data analysis of high-throughput epigenomic data sets. DGW\u2019s ability to recover in an unsupervised manner the observed accumulation of H3K4me3 and H3K9ac at transcription start sites and first splicing sites[5], and to associate clusters with groups of transcription factors, also demonstrates its potential as a useful tool for biological hypothesis generation. We hope that DGW may become a valuable addition to the growing toolkit for epigenome bioinformatics.\n\nDeclarations \nAcknowledgements \nThe authors would like to thank five anonymous reviewers for their useful suggestions and remarks, which have contributed to improve the paper.\n\nDeclarations \nThis article has been published as part of BMC Bioinformatics Volume 17 Supplement 16, 2016: Proceedings of the Tenth International Workshop on Machine Learning in Systems Biology (MLSB 2016). The full contents of the supplement are available online at http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/supplements\/volume-17-supplement-16.\n\nFunding \nG.B.S. acknowledges funding from the E.U. under the Marie Curie actions. G.S. acknowledges funding from the ERC under grant MLC366999 which includes funding for the publication of this article.\n\nAvailability of data and materials \nAll data used in this study are available from the ENCODE project repository.[13][23] (IDs: wgEncodeBroadHistoneK562H3k4me3StdAlnRep1.bam, wgEncodeBroadHistoneK562H3k4me3StdAlnRep2.bam, wgEncodeBroadHistoneK562H3k9acStdAlnRep1.bam, wgEncodeBroadHistoneK562H3k9acStdAlnRep2.bam, wgEncodeBroadHistoneK562ControlStdAlnRep1.bam). DGW is available as a open-source Python package on Github (https:\/\/lukauskas.github.com\/dgw\/).[19] The manual illustrating the package is available from the same URL.\n\nAuthors\u2019 contributions \nSL, GBS and GS designed the research. SL implemented the method and SL, RV and GBS carried out the experiments. All authors wrote the paper. All authors read and approved the final manuscript.\n\nCompeting interests \nThe authors declare that they have no competing interests.\n\nConsent for publication \nNot applicable.\n\nEthics approval and consent to participate \nNot applicable.\n\nReferences \n\n\n\u2191 Furey, T.S. (2012). \"Chip-seq and beyond: New and improved methodologies to detect and characterize protein-DNA interactions\". Nature Reviews Genetics 13 (12): 840-52. doi:10.1038\/nrg3306. PMC PMC3591838. PMID 23090257. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3591838 .   \n\n\u2191 2.0 2.1 2.2 Barski, A.; Cuddapah, S.; Cui, K. et al. (2007). \"High-resolution profiling of histone methylations in the human genome\". Cell 129 (4): 823-37. doi:10.1016\/j.cell.2007.05.009. PMID 17512414.   \n\n\u2191 3.0 3.1 3.2 3.3 Kundaje, A.; Kyriazopoulou-Panagiotopoulou, S.; Libbrecht, M. et al. (2012). \"Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements\". Genome Research 22 (9): 1735-47. doi:10.1101\/gr.136366.111. PMC PMC3431490. PMID 22955985. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3431490 .   \n\n\u2191 4.0 4.1 Schweikert, G.;, Cseke, B.; Clouaire, T. et al. (2013). \"MMDiff: Quantitative testing for shape changes in ChIP-Seq data sets\". BMC Genomics 14: 826. doi:10.1186\/1471-2164-14-826. PMC PMC4008153. PMID 24267901. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4008153 .   \n\n\u2191 5.0 5.1 5.2 5.3 5.4 Bieberstein, N.I.; Carrillo Oesterreich, F.; Straube, K. et al. (2012). \"First exon length controls active chromatin signatures and transcription\". Cell Reports 2 (1): 62\u20138. doi:10.1016\/j.celrep.2012.05.019. PMID 22840397.   \n\n\u2191 6.0 6.1 Benveniste, D.; Sonntag, H.J.; Sanguinetti, G. et al. (2014). \"Transcription factor binding predicts histone modifications in human cell lines\". Proceedings of the National Academy of Sciences of the United States of America 111 (37): 13367-72. doi:10.1073\/pnas.1412081111. PMC PMC4169916. PMID 25187560. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4169916 .   \n\n\u2191 Filion, G.J.; van Bemmel, J.G.; Braunschweig, U. et al. (2010). Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. 143. pp. 212\u201324. doi:10.1016\/j.cell.2010.09.009. PMC PMC3119929. PMID 20888037. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3119929 .   \n\n\u2191 Knijnenburg, T.A.; Ramsey, S.A.; Berman B.P. et al. (2014). Multiscale representation of genomic signals. 11. pp. 689-94. doi:10.1038\/nmeth.2924. PMC PMC4040162. PMID 24727652. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4040162 .   \n\n\u2191 9.0 9.1 9.2 Taslim, C.; Wu, J.; Yan, P. et al. (2009). Comparative study on ChIP-seq data: normalization and binding pattern characterization. 25. pp. 2334-40. doi:10.1093\/bioinformatics\/btp384. PMC PMC2800347. PMID 19561022. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2800347 .   \n\n\u2191 10.0 10.1 Zhang, Y.; Liu, T.; Meyer, C.A. et al. (2008). \"Model-based analysis of ChIP-Seq (MACS)\". Genome Biology 9 (9): R137. doi:10.1186\/gb-2008-9-9-r137. PMC PMC2592715. PMID 18798982. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2592715 .   \n\n\u2191 11.0 11.1 11.2 Sakoe, H.; Chiba, S. (1978). \"Dynamic programming algorithm optimization for spoken word recognition\". IEEE Transactions on Acoustics, Speech, and Signal Processing 26 (1): 62\u20138. doi:10.1109\/TASSP.1978.1163055.   \n\n\u2191 12.0 12.1 M\u00fcller, M. (2007). Information Retrieval for Music and Motion. Springer-Verlag Berlin Heidelberg. pp. 318. doi:10.1007\/978-3-540-74048-3. ISBN 9783540740476.   \n\n\u2191 13.0 13.1 13.2 ENCODE Project Consortium (2012). \"An integrated encyclopedia of DNA elements in the human genome\". Nature 489 (7414): 57-74. doi:10.1038\/nature11247. PMC PMC3439153. PMID 22955616. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3439153 .   \n\n\u2191 Giorgino, T. (2009). \"Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package\". Journal of Statistical Software 31 (7). doi:10.18637\/jss.v031.i07.   \n\n\u2191 Begum, N.; Ulanova, L.; Wang, J.; Keogh, E. (2015). \"Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy\". Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2015: 49\u201358. doi:10.1145\/2783258.2783286.   \n\n\u2191 Hirano, S.; Tsumoto, S. (2005). \"Empirical Comparison of Clustering Methods for Long Time-Series Databases\". In Tsumoto, S.; Yamaguchi, T.; Numao, M.; Motoda, H.. Active Mining. Springer Berlin Heidelberg. pp. 268\u2013286. doi:10.1007\/11423270_15. ISBN 9783540319337.   \n\n\u2191 Heller, K.A.; Ghahramani, Z. (2005). \"Bayesian hierarchical clustering\". Proceedings of the 22nd International Conference on Machine Learning 2005: 297\u2013304. doi:10.1145\/1102351.1102389.   \n\n\u2191 Niennattrakul, V.; Ratanamahatana, C.A. (2009). \"Shape averaging under time warping\". Proceedings of the 6th International Conference on Electrical Engineering\/Electronics, Computer, Telecommunications and Information Technology 2009: 626\u2013629. doi:10.1109\/ECTICON.2009.5137128.   \n\n\u2191 19.0 19.1 Lukauskas, S.. \"Dynamic Genome Warping (DGW)\". http:\/\/lukauskas.co.uk\/dgw\/ .   \n\n\u2191 Matthews, B.W. (1975). \"Comparison of the predicted and observed secondary structure of T4 phage lysozyme\". Biochimica et Biophysica Acta 405 (2): 442\u201351. PMID 1180967.   \n\n\u2191 Jurman, G.; Riccadonna, S.; Furlanello, C. (2012). \"A comparison of MCC and CEN error measures in multi-class prediction\". PLoS One 7 (8): e41882. doi:10.1371\/journal.pone.0041882. PMC PMC3414515. PMID 22905111. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3414515 .   \n\n\u2191 Eisen, M.B.; Spellman, P.T.; Brown, P.O.; Botstein, D. (1998). \"Cluster analysis and display of genome-wide expression patterns\". Proceedings of the National Academy of Sciences of the United States of America 95 (25): 14863-8. PMC PMC24541. PMID 9843981. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC24541 .   \n\n\u2191 ENCODE Project Consortium (2012). \"wgEncodeBroadHistone\". http:\/\/hgdownload.cse.ucsc.edu\/goldenPath\/hg19\/encodeDCC\/wgEncodeBroadHistone\/ .   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. In some cases, the authors directly referenced a citation number; the author and year of the citation was inserted along with the citation for completeness.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\">https:\/\/www.limswiki.org\/index.php\/Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2017)LIMSwiki journal articles (all)LIMSwiki journal articles on bioinformaticsLIMSwiki journal articles on software\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 31 January 2017, at 20:57.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 3,028 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","5321dee46dc24114d97002f69139f201_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_DGW_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:DGW: An exploratory data analysis tool for clustering and visualisation of epigenomic marks<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p><b>Background<\/b>: Functional <a href=\"https:\/\/www.limswiki.org\/index.php\/Genomics\" title=\"Genomics\" target=\"_blank\" class=\"wiki-link\" data-key=\"96a82dabf51cf9510dd00c5a03396c44\">genomic<\/a> and epigenomic research relies fundamentally on <a href=\"https:\/\/www.limswiki.org\/index.php\/Sequencing\" title=\"Sequencing\" class=\"mw-disambig wiki-link\" target=\"_blank\" data-key=\"e36167a9eb152ca16a0c4c4e6d13f323\">sequencing<\/a>-based methods like ChIP-seq for the detection of DNA-protein interactions. These techniques return large, high-dimensional data sets with visually complex structures, such as multi-modal peaks extended over large genomic regions. Current tools for visualisation and data exploration represent and leverage these complex features only to a limited extent.\n<\/p><p><b>Results<\/b>: We present DGW (Dynamic Gene Warping), an open-source software package for simultaneous alignment and clustering of multiple epigenomic marks. DGW uses dynamic time warping to adaptively rescale and align genomic distances which allows to group regions of interest with similar shapes, thereby capturing the structure of epigenomic marks. We demonstrate the effectiveness of the approach in a simulation study and on a real epigenomic data set from the ENCODE project.\n<\/p><p><b>Conclusions<\/b>: Our results show that DGW automatically recognises and aligns important genomic features such as transcription start sites and splicing sites from histone marks. DGW is available as an open-source Python package.\n<\/p><p><b>Keywords<\/b>: Clustering, ChIP-seq, epigenetics, dynamic time warping\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Background\">Background<\/span><\/h2>\n<p>Sequencing-based technologies such as ChIP-Seq and DNAse-Seq [e.g., reviewed in Furey 2012<sup id=\"rdp-ebb-cite_ref-FureyChip12_1-0\" class=\"reference\"><a href=\"#cite_note-FureyChip12-1\" rel=\"external_link\">[1]<\/a><\/sup>] have revolutionized our understanding of chromatin structure and function, yielding deep insights in the importance of epigenomic marks in the basic processes of life. The emergent picture is that gene expression is controlled by a complex interplay of protein binding and epigenomic modifications. While histone marks (and other epigenomic marks) can be measured in a high-throughput way, exploratory data analysis techniques for these data types are still being developed. Epigenomic marks exhibit characteristics that distinguish them fundamentally from e.g., mRNA gene expression measurements: they are spatially extended across regions as wide as several kilobases within which they often present interesting local structures, such as the presence of multiple peaks and troughs<sup id=\"rdp-ebb-cite_ref-BarskiHigh07_2-0\" class=\"reference\"><a href=\"#cite_note-BarskiHigh07-2\" rel=\"external_link\">[2]<\/a><\/sup>, and intriguing asymmetries<sup id=\"rdp-ebb-cite_ref-KundajeUbiq12_3-0\" class=\"reference\"><a href=\"#cite_note-KundajeUbiq12-3\" rel=\"external_link\">[3]<\/a><\/sup> (see Fig. 1). \n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"e5eb22a0dce3c9f726332c18593acdc6\"><img alt=\"Fig1 Lukauskas BMCBioinformatics2016 17-Supp16.gif\" src=\"https:\/\/www.limswiki.org\/images\/4\/4b\/Fig1_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> The epigenomic marks H3K4me3 (left) and H3K9ac (right) accumulate around transcription start sites often showing a bimodal peak with a valley over the TSS. Shown are two biological replicates for each mark and the input signal. Y axis corresponds to read counts. Annotated genes and the enriched regions called by MACS2 are shown in grey below each profile.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The shape of epigenomic marks across replicate data sets appears to be highly conserved, and has recently been exploited for statistical testing.<sup id=\"rdp-ebb-cite_ref-SchweikertMMDiff13_4-0\" class=\"reference\"><a href=\"#cite_note-SchweikertMMDiff13-4\" rel=\"external_link\">[4]<\/a><\/sup> While the biological reasons for such conservation are not entirely clear, recent studies have suggested that both architectural and regulatory aspects may be at play. Bieberstein and colleagues showed intriguing patterns of accumulation of the histone marks H3K4me3 and H3K9ac at splice sites<sup id=\"rdp-ebb-cite_ref-BiebersteinFirst12_5-0\" class=\"reference\"><a href=\"#cite_note-BiebersteinFirst12-5\" rel=\"external_link\">[5]<\/a><\/sup>, hinting at an architectural origin of the shape of the marks. More recently, Benveniste <i>et al.<\/i> showed that histone marks can be very well predicted genome-wide by the binding patterns of transcription factors (TFs).<sup id=\"rdp-ebb-cite_ref-BenvenisteTransc14_6-0\" class=\"reference\"><a href=\"#cite_note-BenvenisteTransc14-6\" rel=\"external_link\">[6]<\/a><\/sup> The shape of the peak may therefore be a readout of additional chromatin-related events and genomic regions which are similarly marked may therefore hint at common regulatory or architectural features. Excellent visualisation tools (e.g., UCSC genome browser) enable researchers to appreciate such features for individual enrichment peaks. However, while automatically grouping such marks based on shape similarity may be a valuable tool for hypothesis generation, it has remained a non-trivial task.\n<\/p><p>Current approaches to clustering regions based on chromatin signatures can be broadly split into two camps: global approaches, such as the celebrated HMM-based reconstruction of the \"colours of the chromatin,\"<sup id=\"rdp-ebb-cite_ref-FilionSyst10_7-0\" class=\"reference\"><a href=\"#cite_note-FilionSyst10-7\" rel=\"external_link\">[7]<\/a><\/sup> try to find a segmentation of large genomic regions based on histone signatures. These approaches usually rely on the \"presence vs. absence\" characterization of histone marks at genomic loci, such that the clustering is primarily based on combinatorial patterns of multiple histone marks, as opposed to spatial patterns emerging within individual peaks. \n<\/p><p>Another interesting segmentation approach was recently introduced by Knijnenburg and colleagues.<sup id=\"rdp-ebb-cite_ref-KnijnenburgMulti14_8-0\" class=\"reference\"><a href=\"#cite_note-KnijnenburgMulti14-8\" rel=\"external_link\">[8]<\/a><\/sup> Here, signal enrichment is considered across a wide range of scales spanning several orders of magnitude. While this constitutes a significant improvement compared to earlier approaches, signal patterns within segments are again not taken into account. On the other hand, local approaches attempt to cluster short genomic regions at particular loci based on the quantitative binding or modification pattern measured at the loci (e.g., via ChIP-Seq). Examples of these approaches include the ENCODE Cluster Aggregation Tool (CAGT)<sup id=\"rdp-ebb-cite_ref-KundajeUbiq12_3-1\" class=\"reference\"><a href=\"#cite_note-KundajeUbiq12-3\" rel=\"external_link\">[3]<\/a><\/sup>, or the clustering of genes based on PolII binding profiles.<sup id=\"rdp-ebb-cite_ref-TaslimComp09_9-0\" class=\"reference\"><a href=\"#cite_note-TaslimComp09-9\" rel=\"external_link\">[9]<\/a><\/sup>\n<\/p><p>Local approaches have to address two challenging problems: aligning the peaks to a reference, and standardising the peaks so that they can be represented as vectors of equal dimensions. To align regions, both the method by Taslim and colleagues as well as the CAGT tool rely on anchor points (e.g., transcription start sites (TSS)<sup id=\"rdp-ebb-cite_ref-TaslimComp09_9-1\" class=\"reference\"><a href=\"#cite_note-TaslimComp09-9\" rel=\"external_link\">[9]<\/a><\/sup> or transcription factor binding sites from auxiliary ChIP-Seq experiments<sup id=\"rdp-ebb-cite_ref-KundajeUbiq12_3-2\" class=\"reference\"><a href=\"#cite_note-KundajeUbiq12-3\" rel=\"external_link\">[3]<\/a><\/sup>). The regions are then standardised either by rescaling to a fixed gene length<sup id=\"rdp-ebb-cite_ref-TaslimComp09_9-2\" class=\"reference\"><a href=\"#cite_note-TaslimComp09-9\" rel=\"external_link\">[9]<\/a><\/sup> or by applying windows of fixed length either side of the anchor points<sup id=\"rdp-ebb-cite_ref-KundajeUbiq12_3-3\" class=\"reference\"><a href=\"#cite_note-KundajeUbiq12-3\" rel=\"external_link\">[3]<\/a><\/sup> irrespective of the true extent of the local enrichment. These strategies may be plausible for certain applications. However, the shape and extent of histone marks for instance, appear to be determined by many factors<sup id=\"rdp-ebb-cite_ref-BiebersteinFirst12_5-1\" class=\"reference\"><a href=\"#cite_note-BiebersteinFirst12-5\" rel=\"external_link\">[5]<\/a><\/sup>, such that a uniform rescaling may be inappropriate. In particular, if one made the assumption that epigenomic marks are directly or indirectly influenced by the underlying DNA sequence, it becomes clear that more flexibility in the comparison and alignment of these marks is needed. For example, ortholog genes may share similar sequence features but their sequence length may vary. Sequence comparisons therefore in general do not require the considered sequences to be of equal length; they allow for insertions, deletions, shifts. Similar local variations should therefore be allowed when comparing epigenomic marks.\n<\/p><p>In this work, we address the problem of aligning and clustering epigenomic data in a completely unsupervised way: as input data we use ChIP-Seq enrichment measurements within peak regions identified by a peak finder such as MACS.<sup id=\"rdp-ebb-cite_ref-ZhangModel08_10-0\" class=\"reference\"><a href=\"#cite_note-ZhangModel08-10\" rel=\"external_link\">[10]<\/a><\/sup> The alignment and the standardisation problems are solved simultaneously without the use of additional information, such as transcription start sites or gene annotation. We introduce a local rescaling which allows to match epigenomic marks based on maximum similarity between shapes. Our method\/software, Dynamic Genome Warping (DGW), is based on the classical dynamic time warping algorithm<sup id=\"rdp-ebb-cite_ref-SakoeDynamic78_11-0\" class=\"reference\"><a href=\"#cite_note-SakoeDynamic78-11\" rel=\"external_link\">[11]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-M.C3.BCllerInfo07_12-0\" class=\"reference\"><a href=\"#cite_note-M.C3.BCllerInfo07-12\" rel=\"external_link\">[12]<\/a><\/sup>, which enabled computer scientists to construct robust speech recognisers undeterred by the variability in pitch and speed of enunciation. In DGW we have implemented multidimensional alignment and clustering, such that multiple epigenomic tracks can be analysed simultaneously. This feature can also be used to control for local sequencing bias as DNA inputs or IGG controls can easily be added to the analysis. We first test DGW in a simulation study. Subsequently, we demonstrate that DGW can align genomic landmarks such as TSSs and first splicing sites (FSSs) on real epigenomic data from the ENCODE project<sup id=\"rdp-ebb-cite_ref-EPCAnInt12_13-0\" class=\"reference\"><a href=\"#cite_note-EPCAnInt12-13\" rel=\"external_link\">[13]<\/a><\/sup>, thus effectively and automatically solving both the alignment and the standardization problems. DGW is freely available as a stand-alone, platform-independent and fully documented Python package.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Methods\">Methods<\/span><\/h2>\n<p>We will first motivate and illustrate our method on a particular data set of histone modifications from the ENCODE project<sup id=\"rdp-ebb-cite_ref-EPCAnInt12_13-1\" class=\"reference\"><a href=\"#cite_note-EPCAnInt12-13\" rel=\"external_link\">[13]<\/a><\/sup>, measuring tri-methylation of histone 3 at lysine 4 (H3K4me3) and acetylation of histone 3 at lysine 9 (H3K9ac) in human leukaemia cell line K562. The reason for choosing these two specific marks is that they are known to be characteristically enriched in the flanking regions of TSSs<sup id=\"rdp-ebb-cite_ref-BarskiHigh07_2-1\" class=\"reference\"><a href=\"#cite_note-BarskiHigh07-2\" rel=\"external_link\">[2]<\/a><\/sup> and they were recently shown to accumulate at FSSs<sup id=\"rdp-ebb-cite_ref-BiebersteinFirst12_5-2\" class=\"reference\"><a href=\"#cite_note-BiebersteinFirst12-5\" rel=\"external_link\">[5]<\/a><\/sup>, hence providing direct evidence of the biological relevance of both the alignment and standardisation problems.\n<\/p><p>Aligned fragments (BAM files) of both epigenomic marks were processed with the MACS2 peak caller<sup id=\"rdp-ebb-cite_ref-ZhangModel08_10-1\" class=\"reference\"><a href=\"#cite_note-ZhangModel08-10\" rel=\"external_link\">[10]<\/a><\/sup> to identify regions which showed enrichment relative to a input control sample; we then merged the two sets by considering every region called for either mark. We stress that the method is independent of the specific marks chosen, or the choice of peak caller, and is readily extendable to other types of genomic and epigenomic data.\n<\/p><p>Enriched regions normally have very different lengths; nevertheless, visual inspection of peaks can reveal similarities between the shape of the peaks. These similarities are often visualised through a global averaging (aggregation) of the marks<sup id=\"rdp-ebb-cite_ref-BarskiHigh07_2-2\" class=\"reference\"><a href=\"#cite_note-BarskiHigh07-2\" rel=\"external_link\">[2]<\/a><\/sup>; nevertheless, there are strong arguments that global averaging may also mask more subtle patterns. A useful motivating example is given in Fig. 1, which shows four regions which are enriched in the H3K4me3 as well as H3K9ac marks. They all overlap with genes and exhibit broadly similar shapes: a bimodal peak with a trough over the TSS. However, the total lengths of the enriched regions vary, and so does the extent of the two individual sub-peaks, which could be governed by the underlying gene structure. Therefore, the position of the TSS relative to the start of the enriched regions varies.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Dynamic_genome_warping\">Dynamic genome warping<\/span><\/h3>\n<p>To automatically quantify the similarities between peaks such as the ones shown in Fig. 1, we use the classic dynamic time warping (DTW) algorithm.<sup id=\"rdp-ebb-cite_ref-SakoeDynamic78_11-1\" class=\"reference\"><a href=\"#cite_note-SakoeDynamic78-11\" rel=\"external_link\">[11]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-GiorginoComp09_14-0\" class=\"reference\"><a href=\"#cite_note-GiorginoComp09-14\" rel=\"external_link\">[14]<\/a><\/sup> A modern review of the basic concepts of dynamic time warping can be found e.g. in M\u00fcller 2007.<sup id=\"rdp-ebb-cite_ref-M.C3.BCllerInfo07_12-1\" class=\"reference\"><a href=\"#cite_note-M.C3.BCllerInfo07-12\" rel=\"external_link\">[12]<\/a><\/sup> It was originally introduced in the speech recognition community to robustly recognize speech independently of speech speed. There, the problem was to match waveforms of similar shape but potentially different duration. Likewise, our aim is to be able to associate peaks which exhibit similar local structure (shape) regardless of their spatial extension.\n<\/p><p>Specifically, let <b>a<\/b>=(a<sub>1<\/sub>,\u2026,a<sub>N<\/sub>) and <b>b<\/b>=(b<sub>1<\/sub>,\u2026,b<sub>M<\/sub>) be two sequences with values <span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/677880033f5923fc31d1029bd0a0a4995bbfd957'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -0.671ex; width:9.201ex; height:2.509ex;\" \/><\/span>, where <span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cfd749ff84450814edb04bb695d674c8dd6a2d24'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -0.338ex; width:1.499ex; height:2.176ex;\" \/><\/span> is a metric space equipped with local distance <span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/afdb2257d08af815d058f6cfcafe0cf2c08de93e'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -0.338ex; width:13.381ex; height:2.176ex;\" \/><\/span> (e.g., squared Euclidean distance or Cosine distance). DTW uses dynamic programming to construct a warping path <span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8ae96f34b5a702fab60c147bc9685d2a3092f41d'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -1.005ex; width:39.825ex; height:3.176ex;\" \/><\/span>, i.e., two sets of indices identifying the elements of the two sequences which are mapped to each other in order to minimise the sum of the local distances. In formulae:\n<\/p><p><span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/de360295184ee27bab733c908b8a1642c9a7f362'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -3.005ex; width:27.246ex; height:7.343ex;\" \/><\/span>\n<\/p><p>subject to the follow constraints:\n<\/p>\n<ul><li> <span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/49167ca840807a323607907e90a83a30731c31bf'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -1.005ex; margin-left: -0.089ex; width:11.896ex; height:3.176ex;\" \/><\/span>, the first points of both sequences are mapped to each other;<\/li>\n<li> <span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1ae9ab0d08825196690fe616c17ad887478f2ed6'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -1.005ex; margin-left: -0.089ex; width:8.114ex; height:3.176ex;\" \/><\/span>, the end points of both sequences are mapped to each other;<\/li>\n<li> <span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a649ee8abe198b939fb18aff160ed9ab85066b2d'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -1.171ex; width:17.511ex; height:3.676ex;\" \/><\/span> for all <i>i<\/i>=1,\u2026,<i>K<\/i> and <i>j<\/i>=0,1, each index set is non-decreasing with maximum step one. This ensures that every point in each sequence gets mapped to at least one point on the other sequence.<\/li><\/ul>\n<p>Algorithmically, DTW is very similar to the classical alignment algorithms such as Needleman-Wunsch and Smith-Waterman: it assumes an optimal alignment between subsequences, iterates by selecting the optimal next move and recovers the optimal global alignment by backtracking. As such, it entails constructing a matrix of size <i>M\u00d7N<\/i>, which determines the computational complexity of the algorithm. Computing pairwise DTW distances between all peaks is therefore the computationally most expensive step, as it involves computing <span><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><\/span><meta class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"background-image: url('https:\/\/en.wikipedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f00be4e2209875e0f68fe9ec58071348822df295'); background-repeat: no-repeat; background-size: 100% 100%; vertical-align: -1.838ex; width:11.125ex; height:4.843ex;\" \/><\/span> DTW distances, each of which is <i>O(M\u00d7N)<\/i>. \n<\/p><p>In Fig. 2 we show how the first two peaks in Fig. 1 are aligned onto each other using DTW. Notice that the pure DTW algorithm allows arbitrarily long stretches to be compressed to a single point. This behaviour may be undesirable, and simple modifications are implemented such as an upper limit on the length of compressed regions (Sakoe-Chiba band<sup id=\"rdp-ebb-cite_ref-SakoeDynamic78_11-2\" class=\"reference\"><a href=\"#cite_note-SakoeDynamic78-11\" rel=\"external_link\">[11]<\/a><\/sup>), or an exponential penalty on compressing\/stretching. By applying the Sakoe-Chiba Band constraint we can also reduce the run-time to <i>O(k\u00d7m a x(N,M))<\/i>, where <i>k<\/i> is the width of the band, that can be chosen to be small. Novel strategies to reduce the computational load are however emerging<sup id=\"rdp-ebb-cite_ref-BegumAccel15_15-0\" class=\"reference\"><a href=\"#cite_note-BegumAccel15-15\" rel=\"external_link\">[15]<\/a><\/sup>, and it would be interesting to integrate such ideas in the epigenomic context.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"9b36423fa1c43d71d1ee654e143c7fbf\"><img alt=\"Fig2 Lukauskas BMCBioinformatics2016 17-Supp16.gif\" src=\"https:\/\/www.limswiki.org\/images\/9\/92\/Fig2_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> DGW alignment of two H3K4me3 profiles. <b>a<\/b> Shown is the distance density matrix for two peaks (Peak 188 on the x-axis and Peak 280 on the y axis). Colour coding corresponds to local Euclidean distances from small (green) to large (red). Optimal path is shown in blue. <b>b<\/b> Mapping between the two profiles. <b>c<\/b> Dynamically aligned profiles and total distance <i>D<\/i> between the two peaks.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>DGW readily extends to multi-dimensional data if more than one epigenomic track is analysed: In this case <b>a<\/b> and <b>b<\/b> become sequences of vectors, e.g., (<b>a<\/b><sub>1<\/sub>,\u2026,<b>a<\/b><sub><i>N<\/i><\/sub>), that each contain the coverage of each mark at time point <i>i<\/i>. In this way, the different epigenomic marks are given equal weight, however other weighting schemes can easily be implemented.\n<\/p><p>In addition to the optimal path between two sequences, we also report their total distance under the optimal warping, which will subsequently be used for the clustering of peaks. Note, when using squared Euclidean distance as local distance measure, both differences in peak shapes as well as in enrichment levels contribute to the overall DTW distance. If this is not desired, the peaks can optionally be normalized by the respective peak heights, and the Cosine distance can be used as local distance. To account for potential strand specificity of epigenomic marks, we compute two distances for every pairwise peak comparison: one with the two sequences unchanged, and one with one of them reversed. The smaller distance between the two is then returned as the true distance between the patterns.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Clustering\">Clustering<\/span><\/h3>\n<p>After aligning all pairwise distances between peaks, we next aim to cluster them into groups which share similar shapes. Implementing k-means clustering within a DTW framework, however, would require the ability to define an average of all potentially possible warped profiles, which is not an easy task. Instead we take advantage of the pre-computed pairwise distances between peaks and perform agglomerative hierarchical clustering, using complete linkage to avoid chaining.<sup id=\"rdp-ebb-cite_ref-HiranoEmp05_16-0\" class=\"reference\"><a href=\"#cite_note-HiranoEmp05-16\" rel=\"external_link\">[16]<\/a><\/sup> The resulting dendrogram contains <i>N <sub>peaks<\/sub><\/i> \u22121 nodes, each of which represents a possible clustering of the data. As in any hierarchical clustering method, the number of clusters can be adaptively chosen by the user. This is both a strength and a weakness of the methodology. Principled methods for choosing a cutoff exist<sup id=\"rdp-ebb-cite_ref-HellerBayesian05_17-0\" class=\"reference\"><a href=\"#cite_note-HellerBayesian05-17\" rel=\"external_link\">[17]<\/a><\/sup> and implementing them in the context of DGW will be a future direction of improvement. DGW computes a prototype for each node, i.e., a sequence representative of all sequences attached to the node (leaves of the tree which has the chosen node as a root). Prototype computation is a non-trivial problem in DTW; here we use the scaled prioritised shape averaging algorithm of Niennattrakul and Ratanamahatana.<sup id=\"rdp-ebb-cite_ref-NiennattrakulShape09_18-0\" class=\"reference\"><a href=\"#cite_note-NiennattrakulShape09-18\" rel=\"external_link\">[18]<\/a><\/sup>\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Pre-processing_pipeline_and_implementation\">Pre-processing pipeline and implementation<\/span><\/h3>\n<p>Here we briefly describe the DGW software package; a more thorough description, including installation instructions and examples, is given in the vignette at the DGW homepage.<sup id=\"rdp-ebb-cite_ref-LukauskasDGW_19-0\" class=\"reference\"><a href=\"#cite_note-LukauskasDGW-19\" rel=\"external_link\">[19]<\/a><\/sup> DGW consists of two modules: a worker module, which performs the computationally intensive tasks, and an explorer module, which allows visual exploration of the results. DGW-worker takes as input a set of genomic regions (a bed file e.g., returned by a peak finder) and a set of data files (bam files) for different epigenomic marks. Single-end reads are extended to the estimated fragment lengths. To alleviate the computational burden and to reduce spatial noise, coverage within peak regions are binned into non-overlapping windows spanning 50 bp. This is an adjustable parameter which should reflect the scale at which local changes are expected in the data. For each peak, we thus construct a sequence <b>a<\/b>=(<i>a<\/i><sub>1<\/sub>,\u2026,<i>a<sub>N<\/sub><\/i>), which contains as values <i>a<sub>i<\/sub><\/i> the coverage within each bin <i>i<\/i>. \n<\/p><p>At this point, we do not normalize with the input sample but use simple read counts. A practical reason for this is that most input samples still have a relatively low coverage. As there is no enrichment for binding sites, input samples cover the whole genome. Input library sizes therefore need to be significantly larger than their IP sample counterparts, which in practice is rarely the case. A simple correction, which uses enrichment over input, is in most applications counterproductive as it adds additional noise to the signal. However, our method allows to add input samples for multidimensional clustering offering a convenient way to incorporate the additional information which is conveyed in a sufficiently sequenced input sample if it is available. The DGW-worker then computes the warping distances, the hierarchical clustering dendrogram, and the prototype sequences associated with each node; this is computationally intensive and the tasks will be automatically distributed across multiple cores if available. A typical run of DGW-worker on the ChIP-seq data set takes 420 minutes of CPU time distributed across six cores, for a total execution time of just over one hour.\n<\/p><p>Once these computations are completed, the lightweight explorer module can be launched. This opens a window displaying a heat map of the peaks and the clustering dendrogram. The dendrogram can be cut at any desired level. The information about which peaks are clustered is returned as a series of BED files (one per cluster) to enable subsequent analyses. Individual clusters can be further analysed and additional functionalities are provided at this level, e.g., histograms of the positions of specific regions of interest pre- and post-warping (Fig. 6), and warpings of individual peaks onto prototypes can be obtained.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Results_and_discussion\">Results and discussion<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Simulation_study\">Simulation study<\/span><\/h3>\n<p>As a proof of correctness, we constructed a simple simulation study that mimics as best as possible a real biological data set. We considered the initial 2 kb of five genes from the UCSC known genes data set, and extracted H3K4me3 data for these five regions from the ENCODE human leukaemia cell line K562 (Fig. 3). The first three of which showed bi-modal peaks, the remaining two exhibited a single peak. We ensured that the first splicing site of these five genes fell within the 2 kb region considered. We generated modified versions of the five seed regions using the following procedure (Fig. 3): A multiplicative Gaussian noise with variance <i>v<\/i> was applied to the read counts in each bin of a seed region. Further, each bin was removed or duplicated with probability <i>p<\/i> producing a shrinkage or a stretch of the peak. Bin duplication was allowed also for duplicated bins resulting in local stretching of varying length. Additionally, the orientation of the simulated peak was switched with probability <i>fp<\/i> in order to simulate anti-sense transcription. For each set of parameters (<i>v<\/i>, <i>p<\/i> and <i>fp<\/i>) we produced 99 simulated peaks starting from each seed thus obtaining a 500 peak dataset (Fig. 3).\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"bb827b589449ed45845c291cf1c8c648\"><img alt=\"Fig3 Lukauskas BMCBioinformatics2016 17-Supp16.gif\" src=\"https:\/\/www.limswiki.org\/images\/2\/2c\/Fig3_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 3.<\/b> Generation of simulated data sets: Shown in blue are five seed regions, i.e., original ENCODE H3K4me3 read counts at the start of five known genes. For each of the seed regions we show 10 simulated modifications which are created by multiplying Gaussian noise to each bin (<i>v<\/i>: 0.1), by introducing insertions and deletions (<i>p<\/i>: 0.1) and by flipping the orientation of the peak with probability <i>fp<\/i>: 0.1. Individual panels (<b>a\u2013e<\/b>) represent the different seed regions.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The Clustering results are shown in Fig. 4, both for a standard hierarchical clustering, as well as for DTW clustering. Figure 4a and b show resulting dendrograms for the simulation experiment with parameters (<i>v<\/i>: 0.25, <i>p<\/i>: 0.25, <i>fp<\/i>: 0.1). In contrast to standard hierarchical clustering DGW identifies five clusters with approximately 100 members each, corresponding well to the initial five seed patterns. We reproduced the data simulation and clustering phases varying the parameter sets in order to investigate a grid of increasing modifications of peak patterns. We quantitatively assess the accuracy of the clustering using the Matthews Correlation Coefficient (MCC) with the generalization for multi-class classification problems.<sup id=\"rdp-ebb-cite_ref-MatthewsComp75_20-0\" class=\"reference\"><a href=\"#cite_note-MatthewsComp75-20\" rel=\"external_link\">[20]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-JurmanAComp12_21-0\" class=\"reference\"><a href=\"#cite_note-JurmanAComp12-21\" rel=\"external_link\">[21]<\/a><\/sup> The results are presented in Fig. 4 and Table 1. The MCC ranges from -1 to 1, the extreme values represent completely incorrect and completely correct classifications, respectively and 0 the result of a random classification. Standard Hierarchical clustering is able to correctly group the simulated peaks according to the pattern they are originally derived from only if the added noise and modifications are small (<i>v<\/i><0.15, <i>p<\/i><0.15). With DGW optimal clustering can be achieved even if the extent of local modifications to the patterns is large (Fig. 4 and Table 1).\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig4_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"e2a33d3ac5bacdd7a204c15b29e80699\"><img alt=\"Fig4 Lukauskas BMCBioinformatics2016 17-Supp16.gif\" src=\"https:\/\/www.limswiki.org\/images\/8\/88\/Fig4_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 4.<\/b> Simulation results for parameter set (<i>v<\/i>: 0.25, <i>p<\/i>: 0.25, <i>fp<\/i>: 0.1). <b>a<\/b> Left panel shows the dendrogram of clustered peaks using hierarchical clustering only. Peaks assigned to each of five different cluster are shown in yellow, pink, blue, red and green. X axis represents the pairwise distances <b>d<\/b> Right panel shows clustered peaks. Colour coding corresponds to normalized read counts. X axis represents original (unwarped) bins from start of the peaks. <b>c<\/b> Matthews Correlation Coefficient for hierarchical clustering based on a set of simulations with varying parameters (<i>v<\/i>, <i>p<\/i>) and <i>fp<\/i> fixed to 0.1. <b>b<\/b> and <b>d<\/b> as <b>a<\/b> and <b>c<\/b> but for DGW alignment followed by hierarchical clustering.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"10\"><b>Table 1.<\/b> Matthews Correlation Coefficient values relative to the classifications of the synthetic peaks produced with the indicated values for <i>p<\/i> and <i>v<\/i> and <i>fp<\/i>=0.1\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"10\">No DTW\n<\/td><\/tr>\n<tr>\n<th style=\"padding-left:10px; padding-right:10px;\"><i>p<\/i>\\<i>v<\/i>\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.00\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.05\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.10\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.15\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.20\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.25\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.30\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.35\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.40\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.00<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.785\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.699\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.412\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.05<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.773\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.486\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.10<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.785\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.795\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.707\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.372\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.15<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.998\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.723\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.505\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.671\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.20<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.682\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.696\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.572\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.501\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.25<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.600\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.748\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.710\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.618\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.452\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.30<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.945\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.995\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.998\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.907\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.682\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.995\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.772\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.767\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.434\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.35<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.957\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.927\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.975\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.613\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.973\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.604\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.701\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.259\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.413\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.40<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.649\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.619\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.990\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.704\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.599\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.746\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.252\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.617\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.294\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"10\">DTW\n<\/td><\/tr>\n<tr>\n<th style=\"padding-left:10px; padding-right:10px;\"><i>p<\/i>\\<i>v<\/i>\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.00\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.05\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.10\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.15\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.20\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.25\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.30\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.35\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">0.40\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.00<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.995\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.780\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.773\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.05<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.995\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.772\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.778\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.10<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.788\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.783\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.766\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.15<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.998\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.975\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.774\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.978\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.20<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.998\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.774\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.752\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.760\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.25<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.998\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.988\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.973\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.948\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.30<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.990\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.988\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.998\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.901\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.421\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.764\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.35<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.971\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.586\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.978\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.990\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.297\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.318\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.710\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"><b>0.40<\/b>\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.000\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.954\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.964\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.998\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.988\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.665\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.973\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.934\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">0.401\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"DGW_automatically_aligns_genomic_landmarks\">DGW automatically aligns genomic landmarks<\/span><\/h3>\n<p>To assess the biological significance of DGW alignment and clustering, we considered two histone marks (H3K4me3 and H3K9ac) from the ENCODE data sets. These marks were chosen as they were shown to accumulate at transcription start sites as well as first splicing sites (FSSs).<sup id=\"rdp-ebb-cite_ref-BiebersteinFirst12_5-3\" class=\"reference\"><a href=\"#cite_note-BiebersteinFirst12-5\" rel=\"external_link\">[5]<\/a><\/sup> Given that first exon length is highly variable, this provides a strong motivation for the local rescaling applied by DGW. For this experiment, enriched regions identified by the MACS2 peak caller were used for clustering such that no anchoring was provided. Using a bin size of 50, we restricted the analysed set of peaks to those that had a length larger than five and smaller than 1000 bins. Also we filtered out peaks with less than 10 counts. We used squared Euclidean distance for the local distance measure between the scaled reads and constrained the DTW with a Sakoe-Chiba Band of width 12.\n<\/p><p>Fig. 5 shows the dendrogram and heat maps for this data. Notice the high variability in peak length, making it virtually impossible to visually distinguish any patterns. Cutting the dendrogram at an appropriate level is a difficult choice. Empirically, cutting the dendrogram near the leaves gives better visualisations, as larger clusters force the algorithm to warp together potentially very different peaks. With this in mind, we chose a cut which resulted in 45 clusters. Fig. 6a and b show the original and warped heat-maps for the two epigenomic marks within one particular cluster. TSS and First Splice Site positions are shown with red and orange dots, respectively. The heat-map of the warped data shows a well defined bimodal pattern of H3K4me3 with TSS aligning in the valley between the two sub-peaks. This is in good agreement with the known pattern of these marks around gene starts. It can be seen that these genomic landmarks or points of interest (POIs) are approximately aligned, without the usage of any prior knowledge of their position in the clustering. This is corroborated by considering the histograms of TSS and FSS positions in the raw and aligned data (Fig. 6c and d). Computing the change in entropy between the histograms shown in Fig. 6, after rescaling the raw data to have the same length, we observe a decrease of 12.91% for TSS and 7.72% for FSS location distributions in the selected cluster after warping. On average, across all clusters, this effect is less pronounced, but still significant: 1.72% decrease on average (95% Bootstrap confidence interval 0.83% \u223c 2.81%) for TSS and 2.65% (1.79% \u223c 3.63%) for FSS respectively, quantitatively demonstrating the ability of DGW to align these genomic landmarks.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig5_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"816a042579f33d3e62752ef6a4686ac5\"><img alt=\"Fig5 Lukauskas BMCBioinformatics2016 17-Supp16.gif\" src=\"https:\/\/www.limswiki.org\/images\/4\/4d\/Fig5_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 5.<\/b> ENCODE data: DGW clustering of the H3K4me3 and H3K9ac marks in the K562 cell line. Shown are Dendrogram and heat-maps. TSS are shown as red dots in the heat-maps.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig6_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"3aa8ee4d8115c2f164bbb36504c3e53e\"><img alt=\"Fig6 Lukauskas BMCBioinformatics2016 17-Supp16.gif\" src=\"https:\/\/www.limswiki.org\/images\/b\/b4\/Fig6_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 6.<\/b> ENCODE data, a sample DGW cluster. <b>a<\/b> heat-maps of the raw and <b>b<\/b> aligned data. Red dots indicate transcription start sites, orange dots first splice sites. <b>c<\/b> Histograms of the positions of TSSs in raw (left) and aligned (right) data. <b>d<\/b> Histograms of the positions of first splice sites in raw (left) and aligned (right) data.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"DGW_clusters_are_enriched_for_co-factor_binding_sites\">DGW clusters are enriched for co-factor binding sites<\/span><\/h3>\n<p>To probe further the biological significance of the DGW clusters, we asked whether the cluster membership could be explained in part by considering shared binding co-factors. To test this hypothesis, we considered ChIP-Seq data sets for 34 transcription factors (TFs) assayed by ENCODE in the K562 cell line (see \"Availability of data and materials\" section for lists of TFs and download sources). Several TFs have been mechanistically associated with histone modifying enzymes, and indeed TF binding has recently been reported to be very strongly predictive of histone modifications.<sup id=\"rdp-ebb-cite_ref-BenvenisteTransc14_6-1\" class=\"reference\"><a href=\"#cite_note-BenvenisteTransc14-6\" rel=\"external_link\">[6]<\/a><\/sup> We extracted peak information from these data sets, and then questioned the distribution of individual TFs binding sites across clusters. Under a reasonable null hypothesis of no relation between clustering and TF binding, one would expect the number of TF peaks falling into the genomic region corresponding to a cluster to be simply proportional to the size of the genomic region, i.e., a uniform distribution.\n<\/p><p>Fig. 7 shows normalised cumulative occurrences of TF binding sites across clusters. For each TF, clusters are ranked by their relative overlap with the given TF. Each bar corresponds to the cumulative level of normalized overlap between the TF and the considered cluster plus all clusters to the left of it. The null hypothesis of uniform distribution would correspond to the red line. On the contrary, if all binding sites for a given TF could be found in a single cluster, all bars would have length 0 except for the right most one, which would have length 1. A large area between the red line and the cumulative plot therefore indicates a strongly non-uniform distribution. Occurrence distributions for some TF, such as TR4, ATF3 or NFE2, are remarkably non-uniform and demonstrate that some clusters are highly enriched for a specific set of TFs. While these tests do not yield an immediately interpretable biological outcome, they strongly hint at a biological significance for enriched regions clustered by DGW.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig7_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"b80940c6426279b85b40a896cef643d0\"><img alt=\"Fig7 Lukauskas BMCBioinformatics2016 17-Supp16.gif\" src=\"https:\/\/www.limswiki.org\/images\/3\/3c\/Fig7_Lukauskas_BMCBioinformatics2016_17-Supp16.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 7.<\/b> Cumulative levels of normalized overlap between each TF and the determined clusters. Each sub-plot corresponds to one TF. For each TF, clusters are ranked by their relative overlap with this TF. Each bar corresponds to the cumulative level of normalized overlap between the TF and the considered cluster plus all clusters to the left of it. The null hypothesis of uniform distribution corresponds to the red lines. The area between the red line and the cumulative plots is indicated below the TF name.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Conclusion\">Conclusion<\/span><\/h2>\n<p>Data exploration and visualisation tools have played a central role in <a href=\"https:\/\/www.limswiki.org\/index.php\/Bioinformatics\" title=\"Bioinformatics\" target=\"_blank\" class=\"wiki-link\" data-key=\"8f506695fdbb26e3f314da308f8c053b\">bioinformatics<\/a> and have contributed in no small part to the success of high-throughput methods in the last decade.<sup id=\"rdp-ebb-cite_ref-EisenCluster98_22-0\" class=\"reference\"><a href=\"#cite_note-EisenCluster98-22\" rel=\"external_link\">[22]<\/a><\/sup> Extending these methodologies for the complex next-generation sequencing data sets poses computational and methodological challenges, yet the potential for hypothesis generation is considerable. ChIP-seq data sets, in particular, yield high-dimensional, structured marks associated with genomic regions. The reproducibility of the spatial structure in the ChIP-seq signal has already inspired the development of shape-based statistical tests for ChIP-seq.<sup id=\"rdp-ebb-cite_ref-SchweikertMMDiff13_4-1\" class=\"reference\"><a href=\"#cite_note-SchweikertMMDiff13-4\" rel=\"external_link\">[4]<\/a><\/sup> In this paper, we addressed the natural question of whether spatial structures in ChIP-seq data can also be used to group genes with similar epigenomic marks. We have proposed a novel method, DGW, which aims to address these problems, using ideas from signal processing and speech recognition. Our results show that DGW can be a practical and user-friendly tool for exploratory data analysis of high-throughput epigenomic data sets. DGW\u2019s ability to recover in an unsupervised manner the observed accumulation of H3K4me3 and H3K9ac at transcription start sites and first splicing sites<sup id=\"rdp-ebb-cite_ref-BiebersteinFirst12_5-4\" class=\"reference\"><a href=\"#cite_note-BiebersteinFirst12-5\" rel=\"external_link\">[5]<\/a><\/sup>, and to associate clusters with groups of transcription factors, also demonstrates its potential as a useful tool for biological hypothesis generation. We hope that DGW may become a valuable addition to the growing toolkit for epigenome bioinformatics.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Declarations\">Declarations<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h3>\n<p>The authors would like to thank five anonymous reviewers for their useful suggestions and remarks, which have contributed to improve the paper.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Declarations_2\">Declarations<\/span><\/h4>\n<p>This article has been published as part of BMC Bioinformatics Volume 17 Supplement 16, 2016: Proceedings of the Tenth International Workshop on Machine Learning in Systems Biology (MLSB 2016). The full contents of the supplement are available online at <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/supplements\/volume-17-supplement-16\" target=\"_blank\">http:\/\/bmcbioinformatics.biomedcentral.com\/articles\/supplements\/volume-17-supplement-16<\/a>.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h4>\n<p>G.B.S. acknowledges funding from the E.U. under the Marie Curie actions. G.S. acknowledges funding from the ERC under grant MLC366999 which includes funding for the publication of this article.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Availability_of_data_and_materials\">Availability of data and materials<\/span><\/h4>\n<p>All data used in this study are available from the ENCODE project repository.<sup id=\"rdp-ebb-cite_ref-EPCAnInt12_13-2\" class=\"reference\"><a href=\"#cite_note-EPCAnInt12-13\" rel=\"external_link\">[13]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-EPCData_23-0\" class=\"reference\"><a href=\"#cite_note-EPCData-23\" rel=\"external_link\">[23]<\/a><\/sup> (IDs: wgEncodeBroadHistoneK562H3k4me3StdAlnRep1.bam, wgEncodeBroadHistoneK562H3k4me3StdAlnRep2.bam, wgEncodeBroadHistoneK562H3k9acStdAlnRep1.bam, wgEncodeBroadHistoneK562H3k9acStdAlnRep2.bam, wgEncodeBroadHistoneK562ControlStdAlnRep1.bam). DGW is available as a open-source Python package on Github (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/lukauskas.github.com\/dgw\/\" target=\"_blank\">https:\/\/lukauskas.github.com\/dgw\/<\/a>).<sup id=\"rdp-ebb-cite_ref-LukauskasDGW_19-1\" class=\"reference\"><a href=\"#cite_note-LukauskasDGW-19\" rel=\"external_link\">[19]<\/a><\/sup> The manual illustrating the package is available from the same URL.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Authors.E2.80.99_contributions\">Authors\u2019 contributions<\/span><\/h4>\n<p>SL, GBS and GS designed the research. SL implemented the method and SL, RV and GBS carried out the experiments. All authors wrote the paper. All authors read and approved the final manuscript.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h4>\n<p>The authors declare that they have no competing interests.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Consent_for_publication\">Consent for publication<\/span><\/h4>\n<p>Not applicable.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Ethics_approval_and_consent_to_participate\">Ethics approval and consent to participate<\/span><\/h4>\n<p>Not applicable.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-FureyChip12-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-FureyChip12_1-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Furey, T.S. (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3591838\" target=\"_blank\">\"Chip-seq and beyond: New and improved methodologies to detect and characterize protein-DNA interactions\"<\/a>. <i>Nature Reviews Genetics<\/i> <b>13<\/b> (12): 840-52. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnrg3306\" target=\"_blank\">10.1038\/nrg3306<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3591838\/\" target=\"_blank\">PMC3591838<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23090257\" target=\"_blank\">23090257<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3591838\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3591838<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Chip-seq+and+beyond%3A+New+and+improved+methodologies+to+detect+and+characterize+protein-DNA+interactions&rft.jtitle=Nature+Reviews+Genetics&rft.aulast=Furey%2C+T.S.&rft.au=Furey%2C+T.S.&rft.date=2012&rft.volume=13&rft.issue=12&rft.pages=840-52&rft_id=info:doi\/10.1038%2Fnrg3306&rft_id=info:pmc\/PMC3591838&rft_id=info:pmid\/23090257&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3591838&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BarskiHigh07-2\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BarskiHigh07_2-0\" rel=\"external_link\">2.0<\/a><\/sup> <sup><a href=\"#cite_ref-BarskiHigh07_2-1\" rel=\"external_link\">2.1<\/a><\/sup> <sup><a href=\"#cite_ref-BarskiHigh07_2-2\" rel=\"external_link\">2.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Barski, A.; Cuddapah, S.; Cui, K. et al. (2007). \"High-resolution profiling of histone methylations in the human genome\". <i>Cell<\/i> <b>129<\/b> (4): 823-37. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.cell.2007.05.009\" target=\"_blank\">10.1016\/j.cell.2007.05.009<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/17512414\" target=\"_blank\">17512414<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=High-resolution+profiling+of+histone+methylations+in+the+human+genome&rft.jtitle=Cell&rft.aulast=Barski%2C+A.%3B+Cuddapah%2C+S.%3B+Cui%2C+K.+et+al.&rft.au=Barski%2C+A.%3B+Cuddapah%2C+S.%3B+Cui%2C+K.+et+al.&rft.date=2007&rft.volume=129&rft.issue=4&rft.pages=823-37&rft_id=info:doi\/10.1016%2Fj.cell.2007.05.009&rft_id=info:pmid\/17512414&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KundajeUbiq12-3\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-KundajeUbiq12_3-0\" rel=\"external_link\">3.0<\/a><\/sup> <sup><a href=\"#cite_ref-KundajeUbiq12_3-1\" rel=\"external_link\">3.1<\/a><\/sup> <sup><a href=\"#cite_ref-KundajeUbiq12_3-2\" rel=\"external_link\">3.2<\/a><\/sup> <sup><a href=\"#cite_ref-KundajeUbiq12_3-3\" rel=\"external_link\">3.3<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Kundaje, A.; Kyriazopoulou-Panagiotopoulou, S.; Libbrecht, M. et al. (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3431490\" target=\"_blank\">\"Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements\"<\/a>. <i>Genome Research<\/i> <b>22<\/b> (9): 1735-47. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1101%2Fgr.136366.111\" target=\"_blank\">10.1101\/gr.136366.111<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3431490\/\" target=\"_blank\">PMC3431490<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22955985\" target=\"_blank\">22955985<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3431490\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3431490<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ubiquitous+heterogeneity+and+asymmetry+of+the+chromatin+environment+at+regulatory+elements&rft.jtitle=Genome+Research&rft.aulast=Kundaje%2C+A.%3B+Kyriazopoulou-Panagiotopoulou%2C+S.%3B+Libbrecht%2C+M.+et+al.&rft.au=Kundaje%2C+A.%3B+Kyriazopoulou-Panagiotopoulou%2C+S.%3B+Libbrecht%2C+M.+et+al.&rft.date=2012&rft.volume=22&rft.issue=9&rft.pages=1735-47&rft_id=info:doi\/10.1101%2Fgr.136366.111&rft_id=info:pmc\/PMC3431490&rft_id=info:pmid\/22955985&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3431490&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SchweikertMMDiff13-4\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-SchweikertMMDiff13_4-0\" rel=\"external_link\">4.0<\/a><\/sup> <sup><a href=\"#cite_ref-SchweikertMMDiff13_4-1\" rel=\"external_link\">4.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Schweikert, G.;, Cseke, B.; Clouaire, T. et al. (2013). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4008153\" target=\"_blank\">\"MMDiff: Quantitative testing for shape changes in ChIP-Seq data sets\"<\/a>. <i>BMC Genomics<\/i> <b>14<\/b>: 826. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1186%2F1471-2164-14-826\" target=\"_blank\">10.1186\/1471-2164-14-826<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4008153\/\" target=\"_blank\">PMC4008153<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24267901\" target=\"_blank\">24267901<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4008153\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4008153<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MMDiff%3A+Quantitative+testing+for+shape+changes+in+ChIP-Seq+data+sets&rft.jtitle=BMC+Genomics&rft.aulast=Schweikert%2C+G.%3B%2C+Cseke%2C+B.%3B+Clouaire%2C+T.+et+al.&rft.au=Schweikert%2C+G.%3B%2C+Cseke%2C+B.%3B+Clouaire%2C+T.+et+al.&rft.date=2013&rft.volume=14&rft.pages=826&rft_id=info:doi\/10.1186%2F1471-2164-14-826&rft_id=info:pmc\/PMC4008153&rft_id=info:pmid\/24267901&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4008153&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BiebersteinFirst12-5\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BiebersteinFirst12_5-0\" rel=\"external_link\">5.0<\/a><\/sup> <sup><a href=\"#cite_ref-BiebersteinFirst12_5-1\" rel=\"external_link\">5.1<\/a><\/sup> <sup><a href=\"#cite_ref-BiebersteinFirst12_5-2\" rel=\"external_link\">5.2<\/a><\/sup> <sup><a href=\"#cite_ref-BiebersteinFirst12_5-3\" rel=\"external_link\">5.3<\/a><\/sup> <sup><a href=\"#cite_ref-BiebersteinFirst12_5-4\" rel=\"external_link\">5.4<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Bieberstein, N.I.; Carrillo Oesterreich, F.; Straube, K. et al. (2012). \"First exon length controls active chromatin signatures and transcription\". <i>Cell Reports<\/i> <b>2<\/b> (1): 62\u20138. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.celrep.2012.05.019\" target=\"_blank\">10.1016\/j.celrep.2012.05.019<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22840397\" target=\"_blank\">22840397<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=First+exon+length+controls+active+chromatin+signatures+and+transcription&rft.jtitle=Cell+Reports&rft.aulast=Bieberstein%2C+N.I.%3B+Carrillo+Oesterreich%2C+F.%3B+Straube%2C+K.+et+al.&rft.au=Bieberstein%2C+N.I.%3B+Carrillo+Oesterreich%2C+F.%3B+Straube%2C+K.+et+al.&rft.date=2012&rft.volume=2&rft.issue=1&rft.pages=62%E2%80%938&rft_id=info:doi\/10.1016%2Fj.celrep.2012.05.019&rft_id=info:pmid\/22840397&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BenvenisteTransc14-6\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BenvenisteTransc14_6-0\" rel=\"external_link\">6.0<\/a><\/sup> <sup><a href=\"#cite_ref-BenvenisteTransc14_6-1\" rel=\"external_link\">6.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Benveniste, D.; Sonntag, H.J.; Sanguinetti, G. et al. (2014). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4169916\" target=\"_blank\">\"Transcription factor binding predicts histone modifications in human cell lines\"<\/a>. <i>Proceedings of the National Academy of Sciences of the United States of America<\/i> <b>111<\/b> (37): 13367-72. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1073%2Fpnas.1412081111\" target=\"_blank\">10.1073\/pnas.1412081111<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4169916\/\" target=\"_blank\">PMC4169916<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/25187560\" target=\"_blank\">25187560<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4169916\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4169916<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Transcription+factor+binding+predicts+histone+modifications+in+human+cell+lines&rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+of+the+United+States+of+America&rft.aulast=Benveniste%2C+D.%3B+Sonntag%2C+H.J.%3B+Sanguinetti%2C+G.+et+al.&rft.au=Benveniste%2C+D.%3B+Sonntag%2C+H.J.%3B+Sanguinetti%2C+G.+et+al.&rft.date=2014&rft.volume=111&rft.issue=37&rft.pages=13367-72&rft_id=info:doi\/10.1073%2Fpnas.1412081111&rft_id=info:pmc\/PMC4169916&rft_id=info:pmid\/25187560&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4169916&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-FilionSyst10-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-FilionSyst10_7-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Filion, G.J.; van Bemmel, J.G.; Braunschweig, U. et al. (2010). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3119929\" target=\"_blank\"><i>Systematic protein location mapping reveals five principal chromatin types in Drosophila cells<\/i><\/a>. <b>143<\/b>. pp. 212\u201324. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.cell.2010.09.009\" target=\"_blank\">10.1016\/j.cell.2010.09.009<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3119929\/\" target=\"_blank\">PMC3119929<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/20888037\" target=\"_blank\">20888037<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3119929\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3119929<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Systematic+protein+location+mapping+reveals+five+principal+chromatin+types+in+Drosophila+cells&rft.aulast=Filion%2C+G.J.%3B+van+Bemmel%2C+J.G.%3B+Braunschweig%2C+U.+et+al.&rft.au=Filion%2C+G.J.%3B+van+Bemmel%2C+J.G.%3B+Braunschweig%2C+U.+et+al.&rft.date=2010&rft.volume=143&rft.issue=2&rft.pages=pp.+212%E2%80%9324&rft_id=info:doi\/10.1016%2Fj.cell.2010.09.009&rft_id=info:pmc\/PMC3119929&rft_id=info:pmid\/20888037&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3119929&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KnijnenburgMulti14-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KnijnenburgMulti14_8-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Knijnenburg, T.A.; Ramsey, S.A.; Berman B.P. et al. (2014). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4040162\" target=\"_blank\"><i>Multiscale representation of genomic signals<\/i><\/a>. <b>11<\/b>. pp. 689-94. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnmeth.2924\" target=\"_blank\">10.1038\/nmeth.2924<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4040162\/\" target=\"_blank\">PMC4040162<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24727652\" target=\"_blank\">24727652<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4040162\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4040162<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Multiscale+representation+of+genomic+signals&rft.aulast=Knijnenburg%2C+T.A.%3B+Ramsey%2C+S.A.%3B+Berman+B.P.+et+al.&rft.au=Knijnenburg%2C+T.A.%3B+Ramsey%2C+S.A.%3B+Berman+B.P.+et+al.&rft.date=2014&rft.volume=11&rft.issue=6&rft.pages=pp.+689-94&rft_id=info:doi\/10.1038%2Fnmeth.2924&rft_id=info:pmc\/PMC4040162&rft_id=info:pmid\/24727652&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4040162&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-TaslimComp09-9\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-TaslimComp09_9-0\" rel=\"external_link\">9.0<\/a><\/sup> <sup><a href=\"#cite_ref-TaslimComp09_9-1\" rel=\"external_link\">9.1<\/a><\/sup> <sup><a href=\"#cite_ref-TaslimComp09_9-2\" rel=\"external_link\">9.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Taslim, C.; Wu, J.; Yan, P. et al. (2009). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2800347\" target=\"_blank\"><i>Comparative study on ChIP-seq data: normalization and binding pattern characterization<\/i><\/a>. <b>25<\/b>. pp. 2334-40. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1093%2Fbioinformatics%2Fbtp384\" target=\"_blank\">10.1093\/bioinformatics\/btp384<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2800347\/\" target=\"_blank\">PMC2800347<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/19561022\" target=\"_blank\">19561022<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2800347\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2800347<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Comparative+study+on+ChIP-seq+data%3A+normalization+and+binding+pattern+characterization&rft.aulast=Taslim%2C+C.%3B+Wu%2C+J.%3B+Yan%2C+P.+et+al.&rft.au=Taslim%2C+C.%3B+Wu%2C+J.%3B+Yan%2C+P.+et+al.&rft.date=2009&rft.volume=25&rft.issue=18&rft.pages=pp.+2334-40&rft_id=info:doi\/10.1093%2Fbioinformatics%2Fbtp384&rft_id=info:pmc\/PMC2800347&rft_id=info:pmid\/19561022&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC2800347&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ZhangModel08-10\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-ZhangModel08_10-0\" rel=\"external_link\">10.0<\/a><\/sup> <sup><a href=\"#cite_ref-ZhangModel08_10-1\" rel=\"external_link\">10.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Zhang, Y.; Liu, T.; Meyer, C.A. et al. (2008). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2592715\" target=\"_blank\">\"Model-based analysis of ChIP-Seq (MACS)\"<\/a>. <i>Genome Biology<\/i> <b>9<\/b> (9): R137. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1186%2Fgb-2008-9-9-r137\" target=\"_blank\">10.1186\/gb-2008-9-9-r137<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2592715\/\" target=\"_blank\">PMC2592715<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/18798982\" target=\"_blank\">18798982<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2592715\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2592715<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Model-based+analysis+of+ChIP-Seq+%28MACS%29&rft.jtitle=Genome+Biology&rft.aulast=Zhang%2C+Y.%3B+Liu%2C+T.%3B+Meyer%2C+C.A.+et+al.&rft.au=Zhang%2C+Y.%3B+Liu%2C+T.%3B+Meyer%2C+C.A.+et+al.&rft.date=2008&rft.volume=9&rft.issue=9&rft.pages=R137&rft_id=info:doi\/10.1186%2Fgb-2008-9-9-r137&rft_id=info:pmc\/PMC2592715&rft_id=info:pmid\/18798982&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC2592715&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SakoeDynamic78-11\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-SakoeDynamic78_11-0\" rel=\"external_link\">11.0<\/a><\/sup> <sup><a href=\"#cite_ref-SakoeDynamic78_11-1\" rel=\"external_link\">11.1<\/a><\/sup> <sup><a href=\"#cite_ref-SakoeDynamic78_11-2\" rel=\"external_link\">11.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sakoe, H.; Chiba, S. (1978). \"Dynamic programming algorithm optimization for spoken word recognition\". <i>IEEE Transactions on Acoustics, Speech, and Signal Processing<\/i> <b>26<\/b> (1): 62\u20138. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1109%2FTASSP.1978.1163055\" target=\"_blank\">10.1109\/TASSP.1978.1163055<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dynamic+programming+algorithm+optimization+for+spoken+word+recognition&rft.jtitle=IEEE+Transactions+on+Acoustics%2C+Speech%2C+and+Signal+Processing&rft.aulast=Sakoe%2C+H.%3B+Chiba%2C+S.&rft.au=Sakoe%2C+H.%3B+Chiba%2C+S.&rft.date=1978&rft.volume=26&rft.issue=1&rft.pages=62%E2%80%938&rft_id=info:doi\/10.1109%2FTASSP.1978.1163055&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-M.C3.BCllerInfo07-12\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-M.C3.BCllerInfo07_12-0\" rel=\"external_link\">12.0<\/a><\/sup> <sup><a href=\"#cite_ref-M.C3.BCllerInfo07_12-1\" rel=\"external_link\">12.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">M\u00fcller, M. (2007). <i>Information Retrieval for Music and Motion<\/i>. Springer-Verlag Berlin Heidelberg. pp. 318. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-540-74048-3\" target=\"_blank\">10.1007\/978-3-540-74048-3<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9783540740476.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Information+Retrieval+for+Music+and+Motion&rft.aulast=M%C3%BCller%2C+M.&rft.au=M%C3%BCller%2C+M.&rft.date=2007&rft.pages=pp.%26nbsp%3B318&rft.pub=Springer-Verlag+Berlin+Heidelberg&rft_id=info:doi\/10.1007%2F978-3-540-74048-3&rft.isbn=9783540740476&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-EPCAnInt12-13\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-EPCAnInt12_13-0\" rel=\"external_link\">13.0<\/a><\/sup> <sup><a href=\"#cite_ref-EPCAnInt12_13-1\" rel=\"external_link\">13.1<\/a><\/sup> <sup><a href=\"#cite_ref-EPCAnInt12_13-2\" rel=\"external_link\">13.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">ENCODE Project Consortium (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3439153\" target=\"_blank\">\"An integrated encyclopedia of DNA elements in the human genome\"<\/a>. <i>Nature<\/i> <b>489<\/b> (7414): 57-74. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnature11247\" target=\"_blank\">10.1038\/nature11247<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3439153\/\" target=\"_blank\">PMC3439153<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22955616\" target=\"_blank\">22955616<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3439153\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3439153<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+integrated+encyclopedia+of+DNA+elements+in+the+human+genome&rft.jtitle=Nature&rft.aulast=ENCODE+Project+Consortium&rft.au=ENCODE+Project+Consortium&rft.date=2012&rft.volume=489&rft.issue=7414&rft.pages=57-74&rft_id=info:doi\/10.1038%2Fnature11247&rft_id=info:pmc\/PMC3439153&rft_id=info:pmid\/22955616&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3439153&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GiorginoComp09-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GiorginoComp09_14-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Giorgino, T. (2009). \"Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package\". <i>Journal of Statistical Software<\/i> <b>31<\/b> (7). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.18637%2Fjss.v031.i07\" target=\"_blank\">10.18637\/jss.v031.i07<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Computing+and+Visualizing+Dynamic+Time+Warping+Alignments+in+R%3A+The+dtw+Package&rft.jtitle=Journal+of+Statistical+Software&rft.aulast=Giorgino%2C+T.&rft.au=Giorgino%2C+T.&rft.date=2009&rft.volume=31&rft.issue=7&rft_id=info:doi\/10.18637%2Fjss.v031.i07&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BegumAccel15-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BegumAccel15_15-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Begum, N.; Ulanova, L.; Wang, J.; Keogh, E. (2015). \"Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy\". <i>Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining<\/i> <b>2015<\/b>: 49\u201358. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1145%2F2783258.2783286\" target=\"_blank\">10.1145\/2783258.2783286<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Accelerating+Dynamic+Time+Warping+Clustering+with+a+Novel+Admissible+Pruning+Strategy&rft.jtitle=Proceedings+of+the+21th+ACM+SIGKDD+International+Conference+on+Knowledge+Discovery+and+Data+Mining&rft.aulast=Begum%2C+N.%3B+Ulanova%2C+L.%3B+Wang%2C+J.%3B+Keogh%2C+E.&rft.au=Begum%2C+N.%3B+Ulanova%2C+L.%3B+Wang%2C+J.%3B+Keogh%2C+E.&rft.date=2015&rft.volume=2015&rft.pages=49%E2%80%9358&rft_id=info:doi\/10.1145%2F2783258.2783286&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HiranoEmp05-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HiranoEmp05_16-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Hirano, S.; Tsumoto, S. (2005). \"Empirical Comparison of Clustering Methods for Long Time-Series Databases\". In Tsumoto, S.; Yamaguchi, T.; Numao, M.; Motoda, H.. <i>Active Mining<\/i>. Springer Berlin Heidelberg. pp. 268\u2013286. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1007%2F11423270_15\" target=\"_blank\">10.1007\/11423270_15<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9783540319337.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Empirical+Comparison+of+Clustering+Methods+for+Long+Time-Series+Databases&rft.atitle=Active+Mining&rft.aulast=Hirano%2C+S.%3B+Tsumoto%2C+S.&rft.au=Hirano%2C+S.%3B+Tsumoto%2C+S.&rft.date=2005&rft.pages=pp.%26nbsp%3B268%E2%80%93286&rft.pub=Springer+Berlin+Heidelberg&rft_id=info:doi\/10.1007%2F11423270_15&rft.isbn=9783540319337&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HellerBayesian05-17\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HellerBayesian05_17-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Heller, K.A.; Ghahramani, Z. (2005). \"Bayesian hierarchical clustering\". <i>Proceedings of the 22nd International Conference on Machine Learning<\/i> <b>2005<\/b>: 297\u2013304. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1145%2F1102351.1102389\" target=\"_blank\">10.1145\/1102351.1102389<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Bayesian+hierarchical+clustering&rft.jtitle=Proceedings+of+the+22nd+International+Conference+on+Machine+Learning&rft.aulast=Heller%2C+K.A.%3B+Ghahramani%2C+Z.&rft.au=Heller%2C+K.A.%3B+Ghahramani%2C+Z.&rft.date=2005&rft.volume=2005&rft.pages=297%E2%80%93304&rft_id=info:doi\/10.1145%2F1102351.1102389&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NiennattrakulShape09-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NiennattrakulShape09_18-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Niennattrakul, V.; Ratanamahatana, C.A. (2009). \"Shape averaging under time warping\". <i>Proceedings of the 6th International Conference on Electrical Engineering\/Electronics, Computer, Telecommunications and Information Technology<\/i> <b>2009<\/b>: 626\u2013629. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1109%2FECTICON.2009.5137128\" target=\"_blank\">10.1109\/ECTICON.2009.5137128<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Shape+averaging+under+time+warping&rft.jtitle=Proceedings+of+the+6th+International+Conference+on+Electrical+Engineering%2FElectronics%2C+Computer%2C+Telecommunications+and+Information+Technology&rft.aulast=Niennattrakul%2C+V.%3B+Ratanamahatana%2C+C.A.&rft.au=Niennattrakul%2C+V.%3B+Ratanamahatana%2C+C.A.&rft.date=2009&rft.volume=2009&rft.pages=626%E2%80%93629&rft_id=info:doi\/10.1109%2FECTICON.2009.5137128&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LukauskasDGW-19\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-LukauskasDGW_19-0\" rel=\"external_link\">19.0<\/a><\/sup> <sup><a href=\"#cite_ref-LukauskasDGW_19-1\" rel=\"external_link\">19.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Lukauskas, S.. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/lukauskas.co.uk\/dgw\/\" target=\"_blank\">\"Dynamic Genome Warping (DGW)\"<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/lukauskas.co.uk\/dgw\/\" target=\"_blank\">http:\/\/lukauskas.co.uk\/dgw\/<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Dynamic+Genome+Warping+%28DGW%29&rft.atitle=&rft.aulast=Lukauskas%2C+S.&rft.au=Lukauskas%2C+S.&rft_id=http%3A%2F%2Flukauskas.co.uk%2Fdgw%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MatthewsComp75-20\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MatthewsComp75_20-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Matthews, B.W. (1975). \"Comparison of the predicted and observed secondary structure of T4 phage lysozyme\". <i>Biochimica et Biophysica Acta<\/i> <b>405<\/b> (2): 442\u201351. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/1180967\" target=\"_blank\">1180967<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Comparison+of+the+predicted+and+observed+secondary+structure+of+T4+phage+lysozyme&rft.jtitle=Biochimica+et+Biophysica+Acta&rft.aulast=Matthews%2C+B.W.&rft.au=Matthews%2C+B.W.&rft.date=1975&rft.volume=405&rft.issue=2&rft.pages=442%E2%80%9351&rft_id=info:pmid\/1180967&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-JurmanAComp12-21\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-JurmanAComp12_21-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Jurman, G.; Riccadonna, S.; Furlanello, C. (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3414515\" target=\"_blank\">\"A comparison of MCC and CEN error measures in multi-class prediction\"<\/a>. <i>PLoS One<\/i> <b>7<\/b> (8): e41882. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pone.0041882\" target=\"_blank\">10.1371\/journal.pone.0041882<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3414515\/\" target=\"_blank\">PMC3414515<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22905111\" target=\"_blank\">22905111<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3414515\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3414515<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+comparison+of+MCC+and+CEN+error+measures+in+multi-class+prediction&rft.jtitle=PLoS+One&rft.aulast=Jurman%2C+G.%3B+Riccadonna%2C+S.%3B+Furlanello%2C+C.&rft.au=Jurman%2C+G.%3B+Riccadonna%2C+S.%3B+Furlanello%2C+C.&rft.date=2012&rft.volume=7&rft.issue=8&rft.pages=e41882&rft_id=info:doi\/10.1371%2Fjournal.pone.0041882&rft_id=info:pmc\/PMC3414515&rft_id=info:pmid\/22905111&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3414515&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-EisenCluster98-22\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-EisenCluster98_22-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Eisen, M.B.; Spellman, P.T.; Brown, P.O.; Botstein, D. (1998). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC24541\" target=\"_blank\">\"Cluster analysis and display of genome-wide expression patterns\"<\/a>. <i>Proceedings of the National Academy of Sciences of the United States of America<\/i> <b>95<\/b> (25): 14863-8. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC24541\/\" target=\"_blank\">PMC24541<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/9843981\" target=\"_blank\">9843981<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC24541\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC24541<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Cluster+analysis+and+display+of+genome-wide+expression+patterns&rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+of+the+United+States+of+America&rft.aulast=Eisen%2C+M.B.%3B+Spellman%2C+P.T.%3B+Brown%2C+P.O.%3B+Botstein%2C+D.&rft.au=Eisen%2C+M.B.%3B+Spellman%2C+P.T.%3B+Brown%2C+P.O.%3B+Botstein%2C+D.&rft.date=1998&rft.volume=95&rft.issue=25&rft.pages=14863-8&rft_id=info:pmc\/PMC24541&rft_id=info:pmid\/9843981&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC24541&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-EPCData-23\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-EPCData_23-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">ENCODE Project Consortium (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/hgdownload.cse.ucsc.edu\/goldenPath\/hg19\/encodeDCC\/wgEncodeBroadHistone\/\" target=\"_blank\">\"wgEncodeBroadHistone\"<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/hgdownload.cse.ucsc.edu\/goldenPath\/hg19\/encodeDCC\/wgEncodeBroadHistone\/\" target=\"_blank\">http:\/\/hgdownload.cse.ucsc.edu\/goldenPath\/hg19\/encodeDCC\/wgEncodeBroadHistone\/<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=wgEncodeBroadHistone&rft.atitle=&rft.aulast=ENCODE+Project+Consortium&rft.au=ENCODE+Project+Consortium&rft.date=2012&rft_id=http%3A%2F%2Fhgdownload.cse.ucsc.edu%2FgoldenPath%2Fhg19%2FencodeDCC%2FwgEncodeBroadHistone%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. In some cases, the authors directly referenced a citation number; the author and year of the citation was inserted along with the citation for completeness.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20181214191102\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.905 seconds\nReal time usage: 1.916 seconds\nPreprocessor visited node count: 19968\/1000000\nPreprocessor generated node count: 37547\/1000000\nPost\u2010expand include size: 164964\/2097152 bytes\nTemplate argument size: 53606\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 692.422 1 - -total\n 82.54% 571.498 1 - Template:Reflist\n 71.80% 497.126 23 - Template:Citation\/core\n 65.02% 450.237 19 - Template:Cite_journal\n 9.98% 69.078 1 - Template:Infobox_journal_article\n 9.63% 66.695 46 - Template:Citation\/identifier\n 9.53% 65.954 1 - Template:Infobox\n 6.37% 44.133 2 - Template:Cite_book\n 5.41% 37.485 80 - Template:Infobox\/row\n 4.92% 34.083 2 - Template:Cite_web\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:9936-0!*!0!!en!5!*!math=5 and timestamp 20181214191100 and revision id 29268\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks\">https:\/\/www.limswiki.org\/index.php\/Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","5321dee46dc24114d97002f69139f201_images":["https:\/\/www.limswiki.org\/images\/4\/4b\/Fig1_Lukauskas_BMCBioinformatics2016_17-Supp16.gif","https:\/\/www.limswiki.org\/images\/9\/92\/Fig2_Lukauskas_BMCBioinformatics2016_17-Supp16.gif","https:\/\/www.limswiki.org\/images\/2\/2c\/Fig3_Lukauskas_BMCBioinformatics2016_17-Supp16.gif","https:\/\/www.limswiki.org\/images\/8\/88\/Fig4_Lukauskas_BMCBioinformatics2016_17-Supp16.gif","https:\/\/www.limswiki.org\/images\/4\/4d\/Fig5_Lukauskas_BMCBioinformatics2016_17-Supp16.gif","https:\/\/www.limswiki.org\/images\/b\/b4\/Fig6_Lukauskas_BMCBioinformatics2016_17-Supp16.gif","https:\/\/www.limswiki.org\/images\/3\/3c\/Fig7_Lukauskas_BMCBioinformatics2016_17-Supp16.gif"],"5321dee46dc24114d97002f69139f201_timestamp":1544814660,"489049f69ab6d4b2f19ec2a155d44c4e_type":"article","489049f69ab6d4b2f19ec2a155d44c4e_title":"Ten simple rules for developing usable software in computational biology (List et al. 2017)","489049f69ab6d4b2f19ec2a155d44c4e_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology","489049f69ab6d4b2f19ec2a155d44c4e_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Ten simple rules for developing usable software in computational biology\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nTen simple rules for developing usable software in computational biologyJournal\n \nPLOS Computational BiologyAuthor(s)\n \nList, Markus; Ebert, Peter; Albrecht, FelipeAuthor affiliation(s)\n \nMax Planck Institute for Informatics, Saarland Informatics CampusPrimary contact\n \nEmail: pebert at mpi-inf dot mpg dot deEditors\n \nMarkel, ScottYear published\n \n2017Volume and issue\n \n13(1)Page(s)\n \ne1005265DOI\n \n10.1371\/journal.pcbi.1005265ISSN\n \n1553-7358Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttp:\/\/journals.plos.org\/ploscompbiol\/article?id=10.1371\/journal.pcbi.1005265Download\n \nhttp:\/\/journals.plos.org\/ploscompbiol\/article\/file?id=10.1371\/journal.pcbi.1005265&type=printable (PDF)\n\nContents\n\n1 Introduction \n2 Rule 1: Identify the missing pieces \n3 Rule 2: Collect feedback from prospective users \n4 Rule 3: Be ready for data growth \n5 Rule 4: Use standard data formats for input and output \n6 Rule 5: Expose only mandatory parameters \n7 Rule 6: Expect users to make mistakes \n8 Rule 7: Provide logging information \n9 Rule 8: Get users started quickly \n10 Rule 9: Offer tutorial material \n11 Rule 10: Consider the future of your tool \n12 Conclusions \n13 Further reading \n14 Acknowledgments \n\n14.1 Funding \n\n\n15 Competing interests \n16 References \n17 Notes \n\n\n\nIntroduction \nThe rise of high-throughput technologies in molecular biology has led to a massive amount of publicly available data. While computational method development has been a cornerstone of biomedical research for decades, the rapid technological progress in the wet laboratory makes it difficult for software development to keep pace. Wet lab scientists rely heavily on computational methods, especially since more research is now performed in silico. However, suitable tools do not always exist, and not everyone has the skills to write complex software. Computational biologists are required to close this gap, but they often lack formal training in software engineering. To alleviate this, several related challenges have been previously addressed in the Ten Simple Rules series, including reproducibility[1], effectiveness[2], and open-source development of software.[3][4]\nHere, we want to shed light on issues concerning software usability. Usability is commonly defined as \"a measure of interface quality that refers to the effectiveness, efficiency, and satisfaction with which users can perform tasks with a tool.\"[5] Considering the subjective nature of this topic, a broad consensus may be hard to achieve. Nevertheless, good usability is imperative for achieving wide acceptance of a software tool in the community. In many cases, academic software starts out as a prototype that solves one specific task and is not geared for a larger user group. As soon as the developer realizes that the complexity of the problems solved by the software could make it widely applicable, the software will grow to meet the new demands. At least by this point, if not sooner, usability should become a priority. Unfortunately, efforts in scientific software development are constrained by limited funding, time, and rapid turnover of group members. As a result, scientific software is often poorly documented, non-intuitive, non-robust with regards to input data and parameters, and hard to install. For many use cases, there is a plethora of tools that appear very similar and make it difficult for the user to select the one that best fits their needs. Not surprisingly, a substantial fraction of these tools are probably abandonware; i.e., these are no longer actively developed or supported in spite of their potential value to the scientific community.\nTo our knowledge, software development as part of scientific research is usually carried out by individuals or small teams with no more than two or three members. Hence, the responsibility of designing, implementing, testing, and documenting the code rests on few shoulders. Additionally, there is pressure to produce publishable results or, at least, to contribute analysis work to ongoing projects. Consequently, academic software is typically released as a prototype. We acknowledge that such a tool cannot adhere to and should not be judged by the standards that we take for granted for production grade software. However, widespread use of a tool is typically in the interest of a researcher. To this end, we propose 10 simple rules that, in our experience, have a considerable impact on improving usability of scientific software.\n\nRule 1: Identify the missing pieces \nUnless you are a pioneer, and few of us are, the problem you are working on is likely addressed by existing tools. As a professional, you are aware of this software but may consider it cumbersome, non-functional, or otherwise unacceptable for your demands. Make sure that your judgment is shared by a substantial fraction of the prospective users before you start developing a new tool. Usable software should offer the features needed and behave as expected by the community. Moreover, a new tool needs to provide substantial novelty over existing solutions. For this purpose, list the requirements on the software and create a comparison table to set the new tool against existing solutions. This allows you to carve out the selling points of your tool in a systematic fashion.\n\nRule 2: Collect feedback from prospective users \nSoftware can be regarded as providing the interface between wet lab science and data analysis. A lack of communication between both sides will lead to misunderstandings that need to be rectified by substantially changing the code base in a late phase of the project. Avoid this pitfall by exposing potential users to a prototype. Discussions on data formats or on the design of the user interface will reveal unforeseen challenges and help to determine if a tool is sufficiently intuitive.[6] To plan your progress, keep a record of suggested improvements and existing issues.\n\nRule 3: Be ready for data growth \nFirst estimate the expected data growth in your field and then design your software accordingly. To this end, consider parallelization and make sure your tool can be integrated seamlessly in workflow management systems (e.g., GALAXY[7] and Taverna[8]), pipeline frameworks (e.g., Ruffus[9] and SnakeMake[10]), or a cluster framework (e.g., Hadoop, http:\/\/hadoop.apache.org\/). Moreover, make sure that the user interface can scale to growing data volumes. For example, consider that the visualizations should still be comprehensible for larger datasets, e.g., by displaying only parts of the data or through aggregation of results.\n\nRule 4: Use standard data formats for input and output \nAs an expert in your research domain, you know the established data standards and related programming libraries for reading and writing commonly used data formats. Make sure that your tool\u2019s output follows standard specifications to the letter, but be as lenient as possible when users provide non-standard input. Tools that follow this rule are more likely to become successful. If you are working in an emerging field with no prevalent model for data exchange, provide data in a structured text file (e.g., tab-separated tables, XML\/XSD, or JSON) and aim for self-documenting output by including header lines and data type descriptions. In this case, document how users can derive suitable input data for your tool.\n\nRule 5: Expose only mandatory parameters \nExposing all (possible) parameters to a user can be confusing and carries the risk of nonsensical parameters settings. When possible, users will thus rely on default parameters. The same applies to benchmark studies comparing your tool against the state-of-the-art competitors. This has three important implications: (i) expose only a small set of parameters by default whose effects on results can be easily understood by any user, (ii) offer advanced parameters only in an expert section and describe them thoroughly in the documentation, and (iii) choose conservatively (and if possible, justify) the default values for parameters such that the tool can operate in a wide range of scenarios and within reasonable run time.\n\nRule 6: Expect users to make mistakes \nYou should never assume that your tool is self-explanatory, that requirements concerning the input data are obvious, or that the user will immediately grasp all details of the problem at hand. Ideally, your tool supports the user in using it appropriately, e.g., by checking that data remain inside required ranges or that identifiers are unique, and provides descriptive error messages in case of unexpected values. If performance penalties due to such checks are a real concern (which should be tested), make the checks optional and enabled by default. Finally, allow users to stop ongoing operations in case they realize they made a mistake.\n\nRule 7: Provide logging information \nTwo types of logs improve usability and also support the user in making their research more reproducible. Configuration logs keep track of basic information, such as the time stamp of the analysis, the version of your tool and of third-party libraries, as well as the parameter settings and input data. Archiving this information is particularly important in long-running research projects in order to trace irregularities in the results at any later point in time.[1] Technical logs, on the other hand, contain progress messages that help users to pinpoint errors in the execution flow and allow clear communication of these issues to the developer. As much as possible, avoid exposing potentially sensitive user information in the logs.\n\nRule 8: Get users started quickly \nComplex setup routines introduce dependency[11] or configuration debt[12]; i.e., the user has to spend substantial time installing software and learning about the execution parameters of a tool. These raise the bar for unhindered exploration of software features. Such issues can be solved by implementing a web application (if feasible with respect to resource demands), by providing a standalone executable, or by providing a system-specific software package. Alternatively, issues of a program\u2019s dependence on third-party libraries can be avoided by encapsulating your tool in a virtual machine image or, e.g., a Docker container (https:\/\/docker.com). Finally, it is imperative to provide demo data that enable users to immediately interact with the software. A successful test run proves to the user that your software works as expected and will be essential if you want your tool to be published.\n\nRule 9: Offer tutorial material \nResearchers can seldom afford the time to thoroughly read complex user manuals. They will thus appreciate a number of clearly written code examples, illustrations, or video screen casts to get started. Most importantly, documented use cases enable users to quickly assess if your tool is suited for the problem at hand and allow fast learning by doing. Keep in mind that these materials have to be updated together with your tool.\n\nRule 10: Consider the future of your tool \nFor long-term availability of your software, use suitable repositories such as GitHub (https:\/\/github.com) or Bitbucket (https:\/\/bitbucket.com) throughout the development process. Explicitly state under which software license you release your code for third parties (see https:\/\/opensource.org\/licenses). Without such a license, using your software might be prohibitive for many organizations or companies. More importantly, keeping your code in a public repository will also allow you to engage with the users through issue tracking (e.g., bugs, suggestions). After releasing your tool, expect support requests and take them seriously. See them as an opportunity to continuously improve the usability of your tool.\n\nConclusions \nIn the above ten simple rules, we highlight that software should not only be scientifically sound but also be perceived as usable for widespread and effective application. To these ends, developers should also be the first to apply their tool, to reveal usability issues as early as possible. However, effort is required from both users and developers to further improve a tool. Even engaging with only a few users (Rule 2) is likely to have a large impact on usability, since, as Jakob Nielsen put it, \"Zero users give zero insights.\"[13]\n\nFurther reading \nUsability is an important topic in software design, and we would like to provide a few starting points for further reading:\n\n Baxter, S.M.; Day, S.W.; Fetrow, J.S. et al. (2006). \"Scientific software development is not an oxymoron\". PLOS Computational Biology 2 (9): e87. doi:10.1371\/journal.pcbi.0020087. PMC PMC1560404. PMID 16965174. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1560404 .   \n Borchardt, J.-C. (30 June 2011). \"Usability in Free Software\". jancborchardt.net. http:\/\/jancborchardt.net\/usability-in-free-software .   \n Macaulay, C.; Sloan, D.; Jiang, X. et al. (2009). \"Usability and user-centered design in scientific software development\". IEEE Software 26 (1): 96\u2013102. doi:10.1109\/MS.2009.27.   \n Nichols, D.; Twidale, M. (2003). \"The usability of open source software\". First Monday 8 (1). doi:10.5210\/fm.v8i1.1018.   \n Seffah, A.; Metzker, E. (2004). \"The obstacles and myths of usability and software engineering\". Communications of the ACM 47 (12): 71\u201376. doi:10.1145\/1035134.1035136.   \n Sloan, D.; Macaulay, C.; Forbes, P. et al. (2009). \"User research in a scientific software development project\". Proceedings of the 23rd British HCI Group Annual Conference on People and Computers: Celebrating People and Technology 2009: 423\u2013429. http:\/\/dl.acm.org\/citation.cfm?id=1671066 .   \nAcknowledgments \nWe would like to thank Thomas Lengauer, Nico Pfeifer, and Fabian M\u00fcller for their critical reading of the manuscript and insightful comments.\n\nFunding \nFA and PE acknowledge the support of the German Federal Ministry of Education and Research grant no. 01KU1216A (DEEP project). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.\n\nCompeting interests \nThe authors have declared that no competing interests exist.\n\nReferences \n\n\n\u2191 1.0 1.1 Sandve, G.K.; Nekrutenko, A.; Taylor, J.; Hovig, E.. \"Ten simple rules for reproducible computational research\". PLOS Computational Biology 9 (10): e1003285. doi:10.1371\/journal.pcbi.1003285. PMC PMC3812051. PMID 24204232. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3812051 .   \n\n\u2191 Osborne, J.M.; Bernabeu, M.O.; Bruna, M. et al.. \"Ten simple rules for effective computational research\". PLOS Computational Biology 10 (3): e1003506. doi:10.1371\/journal.pcbi.1003506. PMC PMC3967918. PMID 24675742. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3967918 .   \n\n\u2191 Prli\u0107, A.; Procter, J.B.. \"Ten simple rules for the open development of scientific software\". PLOS Computational Biology 8 (12): e1002802. doi:10.1371\/journal.pcbi.1002802. PMC PMC3516539. PMID 23236269. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3516539 .   \n\n\u2191 Perez-Riverol, Y.; Gatto, L.; Wang, R. et al.. \"Ten simple rules for taking advantage of Git and GitHub\". PLOS Computational Biology 12 (7): e1004947. doi:10.1371\/journal.pcbi.1004947. PMC PMC4945047. PMID 27415786. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4945047 .   \n\n\u2191 Dillon, A. (2001). \"Human Acceptance of Information Technology\". In Karwowski, W.. Encyclopedia of Human Factors and Ergonomics. Taylor & Francis. pp. 673\u2013675. ISBN 9780748408474.   \n\n\u2191 Thielsch, M.T.; Engel, R.; Hirschfeld, G. (2015). \"Expected usability is not a valid indicator of experienced usability\". PeerJ Computer Science 1: e19. doi:10.7717\/peerj-cs.19.   \n\n\u2191 Giardine, B.; Riemer, C.; Hardison, R.C. et al. (2005). \"Galaxy: A platform for interactive large-scale genome analysis\". Genome Research 15 (10): 1451\u20131455. doi:10.1101\/gr.4086505. PMC PMC1240089. PMID 16169926. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1240089 .   \n\n\u2191 Wolstencroft, K.; Haines, R.; Fellows, D. et al. (2013). \"The Taverna workflow suite: Designing and executing workflows of Web Services on the desktop, web or in the cloud\". Nucleic Acids Research 41 (W1): W557-W561. doi:10.1093\/nar\/gkt328. PMC PMC3692062. PMID 23640334. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3692062 .   \n\n\u2191 Goodstadt, L. (2010). \"Ruffus: A lightweight Python library for computational pipelines\". Bioinformatics 26 (21): 2778-9. doi:10.1093\/bioinformatics\/btq524. PMID 20847218.   \n\n\u2191 K\u00f6ster, J.; Rahmann, S. (2012). \"Snakemake: A scalable bioinformatics workflow engine\". Bioinformatics 28 (19): 2520-2. doi:10.1093\/bioinformatics\/bts480. PMID 22908215.   \n\n\u2191 Morgenthaler, J.D.; Gridney, M.; Sauciuc, R. et al. (2012). \"Searching for build debt: Experiences managing technical debt at Google\". Proceedings of the Third International Workshop on Managing Technical Debt 2012: 1\u20136. ISBN 9781467317498.   \n\n\u2191 Sculley, D.; Holt, G.; Golovin, D. et al. (2014). \"Machine learning: The high interest credit card of technical debt\". Proceedings of SE4ML: Software Engineering for Machine Learning 2014 2014: 1\u20139.   \n\n\u2191 Nielsen, J. (19 March 2000). \"Why You Only Need to Test with 5 Users\". Nielsen Norman Group. https:\/\/www.nngroup.com\/articles\/why-you-only-need-to-test-with-5-users\/ . Retrieved 30 September 2016 .   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. In the original conclusion, the authors provided suggested reading, but the original only included them as citations to an associated opening sentence. These have been moved and expanded out into a new section after the conclusion called \"Further reading\" and organized alphabetically by author (but now no longer appear as citations to the opening sentence).\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\">https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2017)LIMSwiki journal articles (all)LIMSwiki journal articles on bioinformaticsLIMSwiki journal articles on software\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 7 March 2017, at 21:20.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 641 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","489049f69ab6d4b2f19ec2a155d44c4e_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Ten_simple_rules_for_developing_usable_software_in_computational_biology skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Ten simple rules for developing usable software in computational biology<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>The rise of high-throughput technologies in molecular biology has led to a massive amount of publicly available data. While computational method development has been a cornerstone of biomedical research for decades, the rapid technological progress in the wet <a href=\"https:\/\/www.limswiki.org\/index.php\/Laboratory\" title=\"Laboratory\" target=\"_blank\" class=\"wiki-link\" data-key=\"c57fc5aac9e4abf31dccae81df664c33\">laboratory<\/a> makes it difficult for software development to keep pace. Wet lab scientists rely heavily on computational methods, especially since more research is now performed <i>in silico<\/i>. However, suitable tools do not always exist, and not everyone has the skills to write complex software. Computational biologists are required to close this gap, but they often lack formal training in software engineering. To alleviate this, several related challenges have been previously addressed in the <i>Ten Simple Rules<\/i> series, including reproducibility<sup id=\"rdp-ebb-cite_ref-SandveTen13_1-0\" class=\"reference\"><a href=\"#cite_note-SandveTen13-1\" rel=\"external_link\">[1]<\/a><\/sup>, effectiveness<sup id=\"rdp-ebb-cite_ref-OsborneTen14_2-0\" class=\"reference\"><a href=\"#cite_note-OsborneTen14-2\" rel=\"external_link\">[2]<\/a><\/sup>, and open-source development of software.<sup id=\"rdp-ebb-cite_ref-Prli.C4.87Ten12_3-0\" class=\"reference\"><a href=\"#cite_note-Prli.C4.87Ten12-3\" rel=\"external_link\">[3]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-Perez-RiverolTen16_4-0\" class=\"reference\"><a href=\"#cite_note-Perez-RiverolTen16-4\" rel=\"external_link\">[4]<\/a><\/sup>\n<\/p><p>Here, we want to shed light on issues concerning software usability. Usability is commonly defined as \"a measure of interface quality that refers to the effectiveness, efficiency, and satisfaction with which users can perform tasks with a tool.\"<sup id=\"rdp-ebb-cite_ref-DillonHuman01_5-0\" class=\"reference\"><a href=\"#cite_note-DillonHuman01-5\" rel=\"external_link\">[5]<\/a><\/sup> Considering the subjective nature of this topic, a broad consensus may be hard to achieve. Nevertheless, good usability is imperative for achieving wide acceptance of a software tool in the community. In many cases, academic software starts out as a prototype that solves one specific task and is not geared for a larger user group. As soon as the developer realizes that the complexity of the problems solved by the software could make it widely applicable, the software will grow to meet the new demands. At least by this point, if not sooner, usability should become a priority. Unfortunately, efforts in scientific software development are constrained by limited funding, time, and rapid turnover of group members. As a result, scientific software is often poorly documented, non-intuitive, non-robust with regards to input data and parameters, and hard to install. For many use cases, there is a plethora of tools that appear very similar and make it difficult for the user to select the one that best fits their needs. Not surprisingly, a substantial fraction of these tools are probably abandonware; i.e., these are no longer actively developed or supported in spite of their potential value to the scientific community.\n<\/p><p>To our knowledge, software development as part of scientific research is usually carried out by individuals or small teams with no more than two or three members. Hence, the responsibility of designing, implementing, testing, and documenting the code rests on few shoulders. Additionally, there is pressure to produce publishable results or, at least, to contribute analysis work to ongoing projects. Consequently, academic software is typically released as a prototype. We acknowledge that such a tool cannot adhere to and should not be judged by the standards that we take for granted for production grade software. However, widespread use of a tool is typically in the interest of a researcher. To this end, we propose 10 simple rules that, in our experience, have a considerable impact on improving usability of scientific software.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_1:_Identify_the_missing_pieces\">Rule 1: Identify the missing pieces<\/span><\/h2>\n<p>Unless you are a pioneer, and few of us are, the problem you are working on is likely addressed by existing tools. As a professional, you are aware of this software but may consider it cumbersome, non-functional, or otherwise unacceptable for your demands. Make sure that your judgment is shared by a substantial fraction of the prospective users before you start developing a new tool. Usable software should offer the features needed and behave as expected by the community. Moreover, a new tool needs to provide substantial novelty over existing solutions. For this purpose, list the requirements on the software and create a comparison table to set the new tool against existing solutions. This allows you to carve out the selling points of your tool in a systematic fashion.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_2:_Collect_feedback_from_prospective_users\">Rule 2: Collect feedback from prospective users<\/span><\/h2>\n<p>Software can be regarded as providing the interface between wet lab science and data analysis. A lack of communication between both sides will lead to misunderstandings that need to be rectified by substantially changing the code base in a late phase of the project. Avoid this pitfall by exposing potential users to a prototype. Discussions on data formats or on the design of the user interface will reveal unforeseen challenges and help to determine if a tool is sufficiently intuitive.<sup id=\"rdp-ebb-cite_ref-ThielschExpected15_6-0\" class=\"reference\"><a href=\"#cite_note-ThielschExpected15-6\" rel=\"external_link\">[6]<\/a><\/sup> To plan your progress, keep a record of suggested improvements and existing issues.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_3:_Be_ready_for_data_growth\">Rule 3: Be ready for data growth<\/span><\/h2>\n<p>First estimate the expected data growth in your field and then design your software accordingly. To this end, consider parallelization and make sure your tool can be integrated seamlessly in workflow management systems (e.g., GALAXY<sup id=\"rdp-ebb-cite_ref-GiardineGalaxy05_7-0\" class=\"reference\"><a href=\"#cite_note-GiardineGalaxy05-7\" rel=\"external_link\">[7]<\/a><\/sup> and Taverna<sup id=\"rdp-ebb-cite_ref-WolstencroftTheTav13_8-0\" class=\"reference\"><a href=\"#cite_note-WolstencroftTheTav13-8\" rel=\"external_link\">[8]<\/a><\/sup>), pipeline frameworks (e.g., Ruffus<sup id=\"rdp-ebb-cite_ref-GoodstadtRuffus10_9-0\" class=\"reference\"><a href=\"#cite_note-GoodstadtRuffus10-9\" rel=\"external_link\">[9]<\/a><\/sup> and SnakeMake<sup id=\"rdp-ebb-cite_ref-K.C3.B6sterSnake12_10-0\" class=\"reference\"><a href=\"#cite_note-K.C3.B6sterSnake12-10\" rel=\"external_link\">[10]<\/a><\/sup>), or a cluster framework (e.g., Hadoop, <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/hadoop.apache.org\/\" target=\"_blank\">http:\/\/hadoop.apache.org\/<\/a>). Moreover, make sure that the user interface can scale to growing data volumes. For example, consider that the visualizations should still be comprehensible for larger datasets, e.g., by displaying only parts of the data or through aggregation of results.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_4:_Use_standard_data_formats_for_input_and_output\">Rule 4: Use standard data formats for input and output<\/span><\/h2>\n<p>As an expert in your research domain, you know the established data standards and related programming libraries for reading and writing commonly used data formats. Make sure that your tool\u2019s output follows standard specifications to the letter, but be as lenient as possible when users provide non-standard input. Tools that follow this rule are more likely to become successful. If you are working in an emerging field with no prevalent model for data exchange, provide data in a structured text file (e.g., tab-separated tables, XML\/XSD, or JSON) and aim for self-documenting output by including header lines and data type descriptions. In this case, document how users can derive suitable input data for your tool.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_5:_Expose_only_mandatory_parameters\">Rule 5: Expose only mandatory parameters<\/span><\/h2>\n<p>Exposing all (possible) parameters to a user can be confusing and carries the risk of nonsensical parameters settings. When possible, users will thus rely on default parameters. The same applies to benchmark studies comparing your tool against the state-of-the-art competitors. This has three important implications: (i) expose only a small set of parameters by default whose effects on results can be easily understood by any user, (ii) offer advanced parameters only in an expert section and describe them thoroughly in the documentation, and (iii) choose conservatively (and if possible, justify) the default values for parameters such that the tool can operate in a wide range of scenarios and within reasonable run time.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_6:_Expect_users_to_make_mistakes\">Rule 6: Expect users to make mistakes<\/span><\/h2>\n<p>You should never assume that your tool is self-explanatory, that requirements concerning the input data are obvious, or that the user will immediately grasp all details of the problem at hand. Ideally, your tool supports the user in using it appropriately, e.g., by checking that data remain inside required ranges or that identifiers are unique, and provides descriptive error messages in case of unexpected values. If performance penalties due to such checks are a real concern (which should be tested), make the checks optional and enabled by default. Finally, allow users to stop ongoing operations in case they realize they made a mistake.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_7:_Provide_logging_information\">Rule 7: Provide logging information<\/span><\/h2>\n<p>Two types of logs improve usability and also support the user in making their research more reproducible. Configuration logs keep track of basic information, such as the time stamp of the analysis, the version of your tool and of third-party libraries, as well as the parameter settings and input data. Archiving this information is particularly important in long-running research projects in order to trace irregularities in the results at any later point in time.<sup id=\"rdp-ebb-cite_ref-SandveTen13_1-1\" class=\"reference\"><a href=\"#cite_note-SandveTen13-1\" rel=\"external_link\">[1]<\/a><\/sup> Technical logs, on the other hand, contain progress messages that help users to pinpoint errors in the execution flow and allow clear communication of these issues to the developer. As much as possible, avoid exposing potentially sensitive user information in the logs.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_8:_Get_users_started_quickly\">Rule 8: Get users started quickly<\/span><\/h2>\n<p>Complex setup routines introduce dependency<sup id=\"rdp-ebb-cite_ref-MorgenthalerSearching12_11-0\" class=\"reference\"><a href=\"#cite_note-MorgenthalerSearching12-11\" rel=\"external_link\">[11]<\/a><\/sup> or configuration debt<sup id=\"rdp-ebb-cite_ref-SculleyMachine14_12-0\" class=\"reference\"><a href=\"#cite_note-SculleyMachine14-12\" rel=\"external_link\">[12]<\/a><\/sup>; i.e., the user has to spend substantial time installing software and learning about the execution parameters of a tool. These raise the bar for unhindered exploration of software features. Such issues can be solved by implementing a web application (if feasible with respect to resource demands), by providing a standalone executable, or by providing a system-specific software package. Alternatively, issues of a program\u2019s dependence on third-party libraries can be avoided by encapsulating your tool in a virtual machine image or, e.g., a Docker container (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/docker.com\" target=\"_blank\">https:\/\/docker.com<\/a>). Finally, it is imperative to provide demo data that enable users to immediately interact with the software. A successful test run proves to the user that your software works as expected and will be essential if you want your tool to be published.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_9:_Offer_tutorial_material\">Rule 9: Offer tutorial material<\/span><\/h2>\n<p>Researchers can seldom afford the time to thoroughly read complex user manuals. They will thus appreciate a number of clearly written code examples, illustrations, or video screen casts to get started. Most importantly, documented use cases enable users to quickly assess if your tool is suited for the problem at hand and allow fast learning by doing. Keep in mind that these materials have to be updated together with your tool.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_10:_Consider_the_future_of_your_tool\">Rule 10: Consider the future of your tool<\/span><\/h2>\n<p>For long-term availability of your software, use suitable repositories such as GitHub (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/github.com\" target=\"_blank\">https:\/\/github.com<\/a>) or Bitbucket (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/bitbucket.com\" target=\"_blank\">https:\/\/bitbucket.com<\/a>) throughout the development process. Explicitly state under which software license you release your code for third parties (see <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/opensource.org\/licenses\" target=\"_blank\">https:\/\/opensource.org\/licenses<\/a>). Without such a license, using your software might be prohibitive for many organizations or companies. More importantly, keeping your code in a public repository will also allow you to engage with the users through issue tracking (e.g., bugs, suggestions). After releasing your tool, expect support requests and take them seriously. See them as an opportunity to continuously improve the usability of your tool.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusions\">Conclusions<\/span><\/h2>\n<p>In the above ten simple rules, we highlight that software should not only be scientifically sound but also be perceived as usable for widespread and effective application. To these ends, developers should also be the first to apply their tool, to reveal usability issues as early as possible. However, effort is required from both users and developers to further improve a tool. Even engaging with only a few users (Rule 2) is likely to have a large impact on usability, since, as Jakob Nielsen put it, \"Zero users give zero insights.\"<sup id=\"rdp-ebb-cite_ref-NielsenWhy00_13-0\" class=\"reference\"><a href=\"#cite_note-NielsenWhy00-13\" rel=\"external_link\">[13]<\/a><\/sup>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Further_reading\">Further reading<\/span><\/h2>\n<p>Usability is an important topic in software design, and we would like to provide a few starting points for further reading:\n<\/p>\n<ul><li> <span class=\"citation Journal\">Baxter, S.M.; Day, S.W.; Fetrow, J.S. et al. (2006). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1560404\" target=\"_blank\">\"Scientific software development is not an oxymoron\"<\/a>. <i>PLOS Computational Biology<\/i> <b>2<\/b> (9): e87. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.0020087\" target=\"_blank\">10.1371\/journal.pcbi.0020087<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC1560404\/\" target=\"_blank\">PMC1560404<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/16965174\" target=\"_blank\">16965174<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1560404\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1560404<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Scientific+software+development+is+not+an+oxymoron&rft.jtitle=PLOS+Computational+Biology&rft.aulast=Baxter%2C+S.M.%3B+Day%2C+S.W.%3B+Fetrow%2C+J.S.+et+al.&rft.au=Baxter%2C+S.M.%3B+Day%2C+S.W.%3B+Fetrow%2C+J.S.+et+al.&rft.date=2006&rft.volume=2&rft.issue=9&rft.pages=e87&rft_id=info:doi\/10.1371%2Fjournal.pcbi.0020087&rft_id=info:pmc\/PMC1560404&rft_id=info:pmid\/16965174&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC1560404&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/li><\/ul>\n<ul><li> <span class=\"citation web\">Borchardt, J.-C. (30 June 2011). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/jancborchardt.net\/usability-in-free-software\" target=\"_blank\">\"Usability in Free Software\"<\/a>. <i>jancborchardt.net<\/i><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/jancborchardt.net\/usability-in-free-software\" target=\"_blank\">http:\/\/jancborchardt.net\/usability-in-free-software<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Usability+in+Free+Software&rft.atitle=jancborchardt.net&rft.aulast=Borchardt%2C+J.-C.&rft.au=Borchardt%2C+J.-C.&rft.date=30+June+2011&rft_id=http%3A%2F%2Fjancborchardt.net%2Fusability-in-free-software&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/li><\/ul>\n<ul><li> <span class=\"citation Journal\">Macaulay, C.; Sloan, D.; Jiang, X. et al. (2009). \"Usability and user-centered design in scientific software development\". <i>IEEE Software<\/i> <b>26<\/b> (1): 96\u2013102. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1109%2FMS.2009.27\" target=\"_blank\">10.1109\/MS.2009.27<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Usability+and+user-centered+design+in+scientific+software+development&rft.jtitle=IEEE+Software&rft.aulast=Macaulay%2C+C.%3B+Sloan%2C+D.%3B+Jiang%2C+X.+et+al.&rft.au=Macaulay%2C+C.%3B+Sloan%2C+D.%3B+Jiang%2C+X.+et+al.&rft.date=2009&rft.volume=26&rft.issue=1&rft.pages=96%E2%80%93102&rft_id=info:doi\/10.1109%2FMS.2009.27&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/li><\/ul>\n<ul><li> <span class=\"citation Journal\">Nichols, D.; Twidale, M. (2003). \"The usability of open source software\". <i>First Monday<\/i> <b>8<\/b> (1). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.5210%2Ffm.v8i1.1018\" target=\"_blank\">10.5210\/fm.v8i1.1018<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+usability+of+open+source+software&rft.jtitle=First+Monday&rft.aulast=Nichols%2C+D.%3B+Twidale%2C+M.&rft.au=Nichols%2C+D.%3B+Twidale%2C+M.&rft.date=2003&rft.volume=8&rft.issue=1&rft_id=info:doi\/10.5210%2Ffm.v8i1.1018&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/li><\/ul>\n<ul><li> <span class=\"citation Journal\">Seffah, A.; Metzker, E. (2004). \"The obstacles and myths of usability and software engineering\". <i>Communications of the ACM<\/i> <b>47<\/b> (12): 71\u201376. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1145%2F1035134.1035136\" target=\"_blank\">10.1145\/1035134.1035136<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+obstacles+and+myths+of+usability+and+software+engineering&rft.jtitle=Communications+of+the+ACM&rft.aulast=Seffah%2C+A.%3B+Metzker%2C+E.&rft.au=Seffah%2C+A.%3B+Metzker%2C+E.&rft.date=2004&rft.volume=47&rft.issue=12&rft.pages=71%E2%80%9376&rft_id=info:doi\/10.1145%2F1035134.1035136&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/li><\/ul>\n<ul><li> <span class=\"citation Journal\">Sloan, D.; Macaulay, C.; Forbes, P. et al. (2009). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dl.acm.org\/citation.cfm?id=1671066\" target=\"_blank\">\"User research in a scientific software development project\"<\/a>. <i>Proceedings of the 23rd British HCI Group Annual Conference on People and Computers: Celebrating People and Technology<\/i> <b>2009<\/b>: 423\u2013429<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/dl.acm.org\/citation.cfm?id=1671066\" target=\"_blank\">http:\/\/dl.acm.org\/citation.cfm?id=1671066<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=User+research+in+a+scientific+software+development+project&rft.jtitle=Proceedings+of+the+23rd+British+HCI+Group+Annual+Conference+on+People+and+Computers%3A+Celebrating+People+and+Technology&rft.aulast=Sloan%2C+D.%3B+Macaulay%2C+C.%3B+Forbes%2C+P.+et+al.&rft.au=Sloan%2C+D.%3B+Macaulay%2C+C.%3B+Forbes%2C+P.+et+al.&rft.date=2009&rft.volume=2009&rft.pages=423%E2%80%93429&rft_id=http%3A%2F%2Fdl.acm.org%2Fcitation.cfm%3Fid%3D1671066&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/li><\/ul>\n<h2><span class=\"mw-headline\" id=\"Acknowledgments\">Acknowledgments<\/span><\/h2>\n<p>We would like to thank Thomas Lengauer, Nico Pfeifer, and Fabian M\u00fcller for their critical reading of the manuscript and insightful comments.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h3>\n<p>FA and PE acknowledge the support of the German Federal Ministry of Education and Research grant no. 01KU1216A (DEEP project). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h2>\n<p>The authors have declared that no competing interests exist.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-SandveTen13-1\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-SandveTen13_1-0\" rel=\"external_link\">1.0<\/a><\/sup> <sup><a href=\"#cite_ref-SandveTen13_1-1\" rel=\"external_link\">1.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sandve, G.K.; Nekrutenko, A.; Taylor, J.; Hovig, E.. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3812051\" target=\"_blank\">\"Ten simple rules for reproducible computational research\"<\/a>. <i>PLOS Computational Biology<\/i> <b>9<\/b> (10): e1003285. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.1003285\" target=\"_blank\">10.1371\/journal.pcbi.1003285<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3812051\/\" target=\"_blank\">PMC3812051<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24204232\" target=\"_blank\">24204232<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3812051\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3812051<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ten+simple+rules+for+reproducible+computational+research&rft.jtitle=PLOS+Computational+Biology&rft.aulast=Sandve%2C+G.K.%3B+Nekrutenko%2C+A.%3B+Taylor%2C+J.%3B+Hovig%2C+E.&rft.au=Sandve%2C+G.K.%3B+Nekrutenko%2C+A.%3B+Taylor%2C+J.%3B+Hovig%2C+E.&rft.volume=9&rft.issue=10&rft.pages=e1003285&rft_id=info:doi\/10.1371%2Fjournal.pcbi.1003285&rft_id=info:pmc\/PMC3812051&rft_id=info:pmid\/24204232&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3812051&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-OsborneTen14-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-OsborneTen14_2-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Osborne, J.M.; Bernabeu, M.O.; Bruna, M. et al.. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3967918\" target=\"_blank\">\"Ten simple rules for effective computational research\"<\/a>. <i>PLOS Computational Biology<\/i> <b>10<\/b> (3): e1003506. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.1003506\" target=\"_blank\">10.1371\/journal.pcbi.1003506<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3967918\/\" target=\"_blank\">PMC3967918<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24675742\" target=\"_blank\">24675742<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3967918\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3967918<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ten+simple+rules+for+effective+computational+research&rft.jtitle=PLOS+Computational+Biology&rft.aulast=Osborne%2C+J.M.%3B+Bernabeu%2C+M.O.%3B+Bruna%2C+M.+et+al.&rft.au=Osborne%2C+J.M.%3B+Bernabeu%2C+M.O.%3B+Bruna%2C+M.+et+al.&rft.volume=10&rft.issue=3&rft.pages=e1003506&rft_id=info:doi\/10.1371%2Fjournal.pcbi.1003506&rft_id=info:pmc\/PMC3967918&rft_id=info:pmid\/24675742&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3967918&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Prli.C4.87Ten12-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Prli.C4.87Ten12_3-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Prli\u0107, A.; Procter, J.B.. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3516539\" target=\"_blank\">\"Ten simple rules for the open development of scientific software\"<\/a>. <i>PLOS Computational Biology<\/i> <b>8<\/b> (12): e1002802. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.1002802\" target=\"_blank\">10.1371\/journal.pcbi.1002802<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3516539\/\" target=\"_blank\">PMC3516539<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23236269\" target=\"_blank\">23236269<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3516539\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3516539<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ten+simple+rules+for+the+open+development+of+scientific+software&rft.jtitle=PLOS+Computational+Biology&rft.aulast=Prli%C4%87%2C+A.%3B+Procter%2C+J.B.&rft.au=Prli%C4%87%2C+A.%3B+Procter%2C+J.B.&rft.volume=8&rft.issue=12&rft.pages=e1002802&rft_id=info:doi\/10.1371%2Fjournal.pcbi.1002802&rft_id=info:pmc\/PMC3516539&rft_id=info:pmid\/23236269&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3516539&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Perez-RiverolTen16-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Perez-RiverolTen16_4-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Perez-Riverol, Y.; Gatto, L.; Wang, R. et al.. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4945047\" target=\"_blank\">\"Ten simple rules for taking advantage of Git and GitHub\"<\/a>. <i>PLOS Computational Biology<\/i> <b>12<\/b> (7): e1004947. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.1004947\" target=\"_blank\">10.1371\/journal.pcbi.1004947<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4945047\/\" target=\"_blank\">PMC4945047<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/27415786\" target=\"_blank\">27415786<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4945047\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4945047<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ten+simple+rules+for+taking+advantage+of+Git+and+GitHub&rft.jtitle=PLOS+Computational+Biology&rft.aulast=Perez-Riverol%2C+Y.%3B+Gatto%2C+L.%3B+Wang%2C+R.+et+al.&rft.au=Perez-Riverol%2C+Y.%3B+Gatto%2C+L.%3B+Wang%2C+R.+et+al.&rft.volume=12&rft.issue=7&rft.pages=e1004947&rft_id=info:doi\/10.1371%2Fjournal.pcbi.1004947&rft_id=info:pmc\/PMC4945047&rft_id=info:pmid\/27415786&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4945047&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DillonHuman01-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DillonHuman01_5-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Dillon, A. (2001). \"Human Acceptance of Information Technology\". In Karwowski, W.. <i>Encyclopedia of Human Factors and Ergonomics<\/i>. Taylor & Francis. pp. 673\u2013675. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780748408474.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Human+Acceptance+of+Information+Technology&rft.atitle=Encyclopedia+of+Human+Factors+and+Ergonomics&rft.aulast=Dillon%2C+A.&rft.au=Dillon%2C+A.&rft.date=2001&rft.pages=pp.%26nbsp%3B673%E2%80%93675&rft.pub=Taylor+%26+Francis&rft.isbn=9780748408474&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ThielschExpected15-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ThielschExpected15_6-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Thielsch, M.T.; Engel, R.; Hirschfeld, G. (2015). \"Expected usability is not a valid indicator of experienced usability\". <i>PeerJ Computer Science<\/i> <b>1<\/b>: e19. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.7717%2Fpeerj-cs.19\" target=\"_blank\">10.7717\/peerj-cs.19<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Expected+usability+is+not+a+valid+indicator+of+experienced+usability&rft.jtitle=PeerJ+Computer+Science&rft.aulast=Thielsch%2C+M.T.%3B+Engel%2C+R.%3B+Hirschfeld%2C+G.&rft.au=Thielsch%2C+M.T.%3B+Engel%2C+R.%3B+Hirschfeld%2C+G.&rft.date=2015&rft.volume=1&rft.pages=e19&rft_id=info:doi\/10.7717%2Fpeerj-cs.19&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GiardineGalaxy05-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GiardineGalaxy05_7-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Giardine, B.; Riemer, C.; Hardison, R.C. et al. (2005). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1240089\" target=\"_blank\">\"Galaxy: A platform for interactive large-scale genome analysis\"<\/a>. <i>Genome Research<\/i> <b>15<\/b> (10): 1451\u20131455. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1101%2Fgr.4086505\" target=\"_blank\">10.1101\/gr.4086505<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC1240089\/\" target=\"_blank\">PMC1240089<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/16169926\" target=\"_blank\">16169926<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1240089\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1240089<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Galaxy%3A+A+platform+for+interactive+large-scale+genome+analysis&rft.jtitle=Genome+Research&rft.aulast=Giardine%2C+B.%3B+Riemer%2C+C.%3B+Hardison%2C+R.C.+et+al.&rft.au=Giardine%2C+B.%3B+Riemer%2C+C.%3B+Hardison%2C+R.C.+et+al.&rft.date=2005&rft.volume=15&rft.issue=10&rft.pages=1451%E2%80%931455&rft_id=info:doi\/10.1101%2Fgr.4086505&rft_id=info:pmc\/PMC1240089&rft_id=info:pmid\/16169926&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC1240089&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WolstencroftTheTav13-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WolstencroftTheTav13_8-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Wolstencroft, K.; Haines, R.; Fellows, D. et al. (2013). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3692062\" target=\"_blank\">\"The Taverna workflow suite: Designing and executing workflows of Web Services on the desktop, web or in the cloud\"<\/a>. <i>Nucleic Acids Research<\/i> <b>41<\/b> (W1): W557-W561. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1093%2Fnar%2Fgkt328\" target=\"_blank\">10.1093\/nar\/gkt328<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3692062\/\" target=\"_blank\">PMC3692062<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23640334\" target=\"_blank\">23640334<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3692062\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3692062<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Taverna+workflow+suite%3A+Designing+and+executing+workflows+of+Web+Services+on+the+desktop%2C+web+or+in+the+cloud&rft.jtitle=Nucleic+Acids+Research&rft.aulast=Wolstencroft%2C+K.%3B+Haines%2C+R.%3B+Fellows%2C+D.+et+al.&rft.au=Wolstencroft%2C+K.%3B+Haines%2C+R.%3B+Fellows%2C+D.+et+al.&rft.date=2013&rft.volume=41&rft.issue=W1&rft.pages=W557-W561&rft_id=info:doi\/10.1093%2Fnar%2Fgkt328&rft_id=info:pmc\/PMC3692062&rft_id=info:pmid\/23640334&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3692062&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GoodstadtRuffus10-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GoodstadtRuffus10_9-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Goodstadt, L. (2010). \"Ruffus: A lightweight Python library for computational pipelines\". <i>Bioinformatics<\/i> <b>26<\/b> (21): 2778-9. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1093%2Fbioinformatics%2Fbtq524\" target=\"_blank\">10.1093\/bioinformatics\/btq524<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/20847218\" target=\"_blank\">20847218<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ruffus%3A+A+lightweight+Python+library+for+computational+pipelines&rft.jtitle=Bioinformatics&rft.aulast=Goodstadt%2C+L.&rft.au=Goodstadt%2C+L.&rft.date=2010&rft.volume=26&rft.issue=21&rft.pages=2778-9&rft_id=info:doi\/10.1093%2Fbioinformatics%2Fbtq524&rft_id=info:pmid\/20847218&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-K.C3.B6sterSnake12-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-K.C3.B6sterSnake12_10-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">K\u00f6ster, J.; Rahmann, S. (2012). \"Snakemake: A scalable bioinformatics workflow engine\". <i>Bioinformatics<\/i> <b>28<\/b> (19): 2520-2. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1093%2Fbioinformatics%2Fbts480\" target=\"_blank\">10.1093\/bioinformatics\/bts480<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22908215\" target=\"_blank\">22908215<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Snakemake%3A+A+scalable+bioinformatics+workflow+engine&rft.jtitle=Bioinformatics&rft.aulast=K%C3%B6ster%2C+J.%3B+Rahmann%2C+S.&rft.au=K%C3%B6ster%2C+J.%3B+Rahmann%2C+S.&rft.date=2012&rft.volume=28&rft.issue=19&rft.pages=2520-2&rft_id=info:doi\/10.1093%2Fbioinformatics%2Fbts480&rft_id=info:pmid\/22908215&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MorgenthalerSearching12-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MorgenthalerSearching12_11-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Morgenthaler, J.D.; Gridney, M.; Sauciuc, R. et al. (2012). \"Searching for build debt: Experiences managing technical debt at Google\". <i>Proceedings of the Third International Workshop on Managing Technical Debt<\/i> <b>2012<\/b>: 1\u20136. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9781467317498.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Searching+for+build+debt%3A+Experiences+managing+technical+debt+at+Google&rft.jtitle=Proceedings+of+the+Third+International+Workshop+on+Managing+Technical+Debt&rft.aulast=Morgenthaler%2C+J.D.%3B+Gridney%2C+M.%3B+Sauciuc%2C+R.+et+al.&rft.au=Morgenthaler%2C+J.D.%3B+Gridney%2C+M.%3B+Sauciuc%2C+R.+et+al.&rft.date=2012&rft.volume=2012&rft.pages=1%E2%80%936&rft.isbn=9781467317498&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SculleyMachine14-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SculleyMachine14_12-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sculley, D.; Holt, G.; Golovin, D. et al. (2014). \"Machine learning: The high interest credit card of technical debt\". <i>Proceedings of SE4ML: Software Engineering for Machine Learning 2014<\/i> <b>2014<\/b>: 1\u20139.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Machine+learning%3A+The+high+interest+credit+card+of+technical+debt&rft.jtitle=Proceedings+of+SE4ML%3A+Software+Engineering+for+Machine+Learning+2014&rft.aulast=Sculley%2C+D.%3B+Holt%2C+G.%3B+Golovin%2C+D.+et+al.&rft.au=Sculley%2C+D.%3B+Holt%2C+G.%3B+Golovin%2C+D.+et+al.&rft.date=2014&rft.volume=2014&rft.pages=1%E2%80%939&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NielsenWhy00-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NielsenWhy00_13-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Nielsen, J. (19 March 2000). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.nngroup.com\/articles\/why-you-only-need-to-test-with-5-users\/\" target=\"_blank\">\"Why You Only Need to Test with 5 Users\"<\/a>. Nielsen Norman Group<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.nngroup.com\/articles\/why-you-only-need-to-test-with-5-users\/\" target=\"_blank\">https:\/\/www.nngroup.com\/articles\/why-you-only-need-to-test-with-5-users\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 30 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Why+You+Only+Need+to+Test+with+5+Users&rft.atitle=&rft.aulast=Nielsen%2C+J.&rft.au=Nielsen%2C+J.&rft.date=19+March+2000&rft.pub=Nielsen+Norman+Group&rft_id=https%3A%2F%2Fwww.nngroup.com%2Farticles%2Fwhy-you-only-need-to-test-with-5-users%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. In the original conclusion, the authors provided suggested reading, but the original only included them as citations to an associated opening sentence. These have been moved and expanded out into a new section after the conclusion called \"Further reading\" and organized alphabetically by author (but now no longer appear as citations to the opening sentence).\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20181214191100\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.482 seconds\nReal time usage: 0.507 seconds\nPreprocessor visited node count: 16041\/1000000\nPreprocessor generated node count: 34212\/1000000\nPost\u2010expand include size: 125141\/2097152 bytes\nTemplate argument size: 40483\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 488.772 1 - -total\n 72.26% 353.173 19 - Template:Citation\/core\n 66.15% 323.301 16 - Template:Cite_journal\n 55.95% 273.481 1 - Template:Reflist\n 13.07% 63.895 1 - Template:Infobox_journal_article\n 12.40% 60.615 1 - Template:Infobox\n 8.57% 41.877 31 - Template:Citation\/identifier\n 7.40% 36.154 80 - Template:Infobox\/row\n 6.72% 32.857 2 - Template:Cite_web\n 4.78% 23.379 1 - Template:Cite_book\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:9967-0!*!0!!en!*!* and timestamp 20181214191100 and revision id 29487\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology\">https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","489049f69ab6d4b2f19ec2a155d44c4e_images":[],"489049f69ab6d4b2f19ec2a155d44c4e_timestamp":1544814660,"e28b66162eedd9f2c9137d1be8322cec_type":"article","e28b66162eedd9f2c9137d1be8322cec_title":"SCIFIO: An extensible framework to support scientific image formats (Hiner et al. 2017)","e28b66162eedd9f2c9137d1be8322cec_url":"https:\/\/www.limswiki.org\/index.php\/Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats","e28b66162eedd9f2c9137d1be8322cec_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:SCIFIO: An extensible framework to support scientific image formats\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nSCIFIO: An extensible framework to support scientific image formatsJournal\n \nBMC BioinformaticsAuthor(s)\n \nHiner, Mark C.; Rueden, Curtis T.; Eliceiri, Kevin W.Author affiliation(s)\n \nUniversity of Wisconsin at Madison, Morgridge Institute for ResearchPrimary contact\n \nEmail: eliceiri at wisc dot eduYear published\n \n2016Volume and issue\n \n17Page(s)\n \n521DOI\n \n10.1186\/s12859-016-1383-0ISSN\n \n1471-2105Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttp:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-016-1383-0Download\n \nhttp:\/\/bmcbioinformatics.biomedcentral.com\/track\/pdf\/10.1186\/s12859-016-1383-0 (PDF)\n\nContents\n\n1 Abstract \n2 Background \n3 Implementation \n4 Results and discussion \n5 Conclusions \n6 Abbreviations \n7 Declarations \n\n7.1 Acknowledgements \n\n7.1.1 Funding \n7.1.2 Availability of data and materials \n7.1.3 Authors\u2019 contributions \n7.1.4 Competing interests \n7.1.5 Consent for publication \n7.1.6 Ethics approval and consent to participate \n7.1.7 Maven artifacts \n\n\n\n\n8 References \n9 Notes \n\n\n\nAbstract \nBackground: No gold standard exists in the world of scientific image acquisition; a proliferation of instruments each with its own proprietary data format has made out-of-the-box sharing of that data nearly impossible. In the field of light microscopy, the Bio-Formats library was designed to translate such proprietary data formats to a common, open-source schema, enabling sharing and reproduction of scientific results. While Bio-Formats has proved successful for microscopy images, the greater scientific community was lacking a domain-independent framework for format translation.\nResults: SCIFIO (SCientific Image Format Input and Output) is presented as a freely available, open-source library unifying the mechanisms of reading and writing image data. The core of SCIFIO is its modular definition of formats, the design of which clearly outlines the components of image I\/O to encourage extensibility, facilitated by the dynamic discovery of the SciJava plugin framework. SCIFIO is structured to support coexistence of multiple domain-specific open exchange formats, such as Bio-Formats\u2019 OME-TIFF, within a unified environment.\nConclusions: SCIFIO is a freely available software library developed to standardize the process of reading and writing scientific image formats.\nKeywords: SCIFIO, image analysis, open-source, Bio-Formats, ImageJ\n\nBackground \nImage formats are defined by the logical layout of metadata and pixel information across one or more data sources. Proprietary file formats (PFFs) are created when an imaging instrument, such as a microscope, records such data in a structure that is not publicly described. PFFs are especially problematic in scientific domains, as each company or even instrument brings the potential for a new file format, possibly requiring licensed software to decode, or the file format changing in structure without notice or recourse. The scientific method necessitates that data can be analyzed by others to verify and reproduce results; when said data is stored in a proprietary format, by definition, it cannot be freely shared and inspected.\nIn response to the proliferation of PFFs in the fields of life science, the Open Microscopy Environment (OME) consortium developed the Bio-Formats library to standardize the reading of microscopy data.[1] Bio-Formats provides an application programming interface (API) for reading and writing images, backed by a comprehensive collection of extensions to decode format-specific information and translate it into an open specification called the OME data model.[2] A translated image can then be written as OME-TIFF, an \u201copen-exchange format\u201d which combines the universal readability of the TIFF standard with an XML schema representing the OME data model (OME-XML). These OME-TIFF images can be freely shared, with pixel data accessible via standard libraries such as libtiff[3], and the complete metadata parseable by any standards-compliant XML reader. In this way, the Bio-Formats project greatly mitigates the PFF problem in microscopy.\nBio-Formats has become an essential tool for scientists worldwide; however, its metadata model specifically targets 5-dimensional images in microscopy and related life sciences disciplines. PFFs from other scientific domains \u2014 e.g., medical imaging, astronomy, industrial x-rays, materials science and geoscience \u2014 each have their own unique considerations with respect to the dimensionality and metadata of their images; as such, it would be infeasible for a single \u201cone-size-fits-all\u201d metadata model to fully address the needs of scientific imaging as a whole. With this conclusion in mind, we have developed the SCIFIO (SCientific Image Format Input and Output) library, generalizing the success of Bio-Formats to create a domain-independent image I\/O framework enabling seamless and extensible translation between image metadata models. The goal of SCIFIO is to provide the architecture that will equally facilitate: 1) the conversion of additional formats into supported open-exchange formats such as OME-TIFF and 2) the integration of additional scientific open-exchange formats such as Digital Imaging and Communications in Medicine (DICOM)[4], Flexible Image Transport System (FITS)[5] and NetCDF[6] into a common image I\/O framework.\n\nImplementation \nSCIFIO is implemented as a plugin suite for the SciJava plugin framework. Its core is written under the permissive BSD license to maximize freedom of inclusion in both open- and closed-source applications. The SciJava framework collects Plugins in an application Context which are typically accessed via Services. As such, SCIFIO defines a collection of Plugins and Services facilitating image I\/O. Developers will typically start with the SCIFIO class itself: a Gateway to the SciJava Context providing convenient access methods for functional components of the SCIFIO framework.\nThe SciJava framework sorts Plugins by \u201ctype,\u201d representing the role of a given Plugin. Extensibility and flexibility is achieved by providing a public Service API which organizes and delegates to available Plugins of each type. Thus, SCIFIO development is primarily concerned with adding new Plugin implementations to achieve a desired result. The following sections describe the key Plugin types in SCIFIO, and the behavior they control.\nFirst and foremost is the Format. Formats are a collection of interface-driven components (Fig. 1) defining the steps for decoding an image source to its metadata and pixel values. In SCIFIO, the ImageJ Common data model is used to describe pixels; this data model is built on ImgLib2[7] due to its type and algorithmic flexibility, ensuring images opened with SCIFIO are universally recognized within the ImageJ ecosystem.[8] A Format must always include a Metadata component defining its unique fields and structures, such as acquisition instrument details, dimensional axis types, or detector emission wavelengths. Each Metadata implementation must also be able to express itself as a standard format-independent ImageMetadata object, establishing a common baseline for use within the framework.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. Components of a Format plugin and their role in image I\/O\n\n\n\nThe Checker component contains the logic for matching a given Format with a potential image source, while the Parser component performs the actual creation of Metadata from that source. The Reader and Writer components use Metadata to read or write pixel data, respectively. Given the goal of freely shareable image data, Writers are optional components and should not be implemented for proprietary formats.\nA second essential Plugin type is the Translator, which encodes logic for conversion from one Metadata type to another. Translators enable the standardization of proprietary formats to common Metadata structures such as OME, and hence play a key role in converting images between Formats. Translators are typically created to accompany Writers, ensuring Format-specific metadata is properly populated. Additionally, the Translator framework enables the integration of new open-exchange formats via Translator-only libraries, converting supported Metadata types to the new standard. An example of this model can be seen in the SCIFIO-OME-XML component (Fig. 2).\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 2. SCIFIO-OME-XML Translator suite, for converting metadata to the OME-TIFF open-exchange format\n\n\n\nWhile Formats and Translators add new behavior to the base framework, SCIFIO also has Plugin types to control existing behavior. For example, Filter plugins provide a Format-agnostic mechanism for modifying Reader behavior. Filters create an ordered chain of delegation, each operating on the data of its parent, and can be individually toggled \u2018on\u2019 or \u2018off\u2019 on a per-Reader basis. Sample Filter stacking behavior is illustrated in a ChannelFiller for converting \u201cindexed color\u201d pixels to RGB values and a FileStitcher for unifying multiple files on disk to form one dataset (Fig. 3).\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 3. Behavior of ChannelFiller and FileStitcher Filter plugins\n\n\n\nWith all SciJava Plugins, a numeric priority value attached to each class creates an implicit relative ordering for operations \u2014 e.g., order of Checker querying, Translator querying, or Filter application. Priorities are automatically considered when using the SCIFIO Services: from the FormatService polling Checker components to the TranslatorService finding the correct Translator for a given request, priorities allow querying the most specific solutions first, before moving to more general options. These pieces together provide a robust and flexible library for reading and writing image data.\n\nResults and discussion \nAs the fundamental goal of SCIFIO is to establish an extensible framework for image support, the SciJava framework is a logical choice for implementation. SciJava provides extensible solutions to common software problems, which implicitly benefit SCIFIO. A core example is the extensible script language framework (http:\/\/imagej.net\/Scripting) which effectively allows SCIFIO to be used from any number of programming languages without requiring language-specific considerations in SCIFIO itself.\nImageJ[9] presents the flagship use case for SCIFIO, allowing an established community to vet and refine the library. Although users do not directly interact with SCIFIO API, all image I\/O operations in ImageJ ultimately rely on SCIFIO. As developers contribute new Format plugins for image types relevant to their work, any application using SCIFIO can immediately benefit from the new plugin. Looking beyond ImageJ, projects like KNIME Image Processing (KNIP), built on the KNIME Analytics Platform[10], have already adopted SCIFIO for their image I\/O mechanism. This sort of code sharing leads to a form of mutualistic collaboration: a new Format plugin developed for KNIP will automatically work in ImageJ, with the converse true as well. Equally importantly, both ImageJ and KNIP can implicitly operate on image data produced by the other program, laying the foundation for algorithmic interoperability.\nCollaborations like this would not be possible with a focused library like Bio-Formats. KNIME is a platform for extensible workflows, thus its handling of image data demands flexibility beyond the fixed 5D microscopy schema of OME. Additionally, Bio-Formats\u2019 mechanism of format extension requires either modification of a text-based configuration file to define format priority, which can lead to conflicts if multiple libraries provide differing versions of this file, or runtime modification by API calls, which may not be reproducible without a central mechanism controlling these calls. Conversely, the dynamic discovery of the SciJava plugin framework allows SCIFIO developers to provide their Formats completely independently \u2014 e.g., on an ImageJ, KNIME or Eclipse update site, while SCIFIO\u2019s backing by the ImageJ Common data model ensures adaptation to any future requirements in imaging dimensionality and data types.\nBio-Formats readers and writers and SCIFIO Format components define similar high level logic, but in Bio-Formats several I\/O steps are conflated in a single monolithic interface with many protected methods as potential extension points. SCIFIO encapsulates each I\/O step into its own dedicated component, to minimize the effort required in format development. Whether a format is added to Bio-Formats or SCIFIO libraries; the SCIFIO-BF-Compat and SCIFIO-OME-XML components offer bidirectional compatibility between SCIFIO and Bio-Formats.\nBio-Formats has demonstrated the feasibility of standardizing a broad field of PFFs into a common open-exchange format. SCIFIO provides a natural generalization of thinking, allowing extension to new domains, through the integration of their Metadata standards and open-exchange formats via Translators, and clear paths for contributing to existing domains by encapsulating the logic of Format components. Given the added immediate power of the Bio-Formats integration layers, we see the SCIFIO framework as a potential unifying solution to PFFs in scientific image data.\n\nConclusions \nSCIFIO is an open-source library generalizing the successful structure of Bio-Formats to create a domain-independent framework for the reading, writing, and translation of images. The extensible design of SCIFIO facilitates community contribution, the establishment of domain-specific metadata standards, and integration into a unified system capable of adapting to the demands of scientific imaging analysis.\n\nAbbreviations \nAPI: Application program interface\nI\/O: Input and\/or output\nKNIME: Konstanz Information Miner\nKNIP: KNIME Image Processing\nOME: Open Microscopy Environment\nPFF: Proprietary file formats\nSCIFIO: SCientific Image Format Input and Output\n\nDeclarations \nAcknowledgements \nMany people have contributed to the development of SCIFIO on both technical and leadership levels. In particular, the authors gratefully thank and acknowledge the efforts of (in alphabetical order): Ellen T. Arena, Anne Carpenter, Christian Dietz, Gabriel Einsdorf, Melissa Linkert, Josh Moore, Tobias Pietzsch, Stephan Preibisch, Stephan Saalfeld, Jason Swedlow, and Pavel Tomancak. We also thank the entire ImageJ community, especially those who contributed patch submissions, use cases, feature requests and bug reports.\n\nFunding \nResearch reported in this publication was supported by ACI Division of Advanced Cyberinfrastructure of the National Science Foundation under award number 1148362 and additional internal funding from the Laboratory for Optical and Computational Instrumentation.\n\nAvailability of data and materials \nProject name: SCIFIO\nProject home page: http:\/\/scif.io\/\nArchived version: 0.28.2 \nhttp:\/\/maven.imagej.net\/service\/local\/repositories\/releases\/content\/io\/scif\/scifio\/0.28.2\/scifio-0.28.2.jar\nSource code: https:\/\/github.com\/scifio\/scifio\nOperating system(s): Platform-independent\nProgramming language: Java\nOther requirements: Java 1.8 or higher runtime, io.scif:scifio-jai-imageio, net.imagej:imagej-common, net.imglib2:imglib2, org.scijava:scijava-common, org.mapdb:mapdb\nLicense: BSD\n\nAuthors\u2019 contributions \nMCH was the lead implementer of the software. CTR architected the underlying SciJava foundation and guided SCIFIO development. As the primary principal investigator of SCIFIO, KWE directed and advised on all aspects of the project including development directions and priorities. All authors contributed to, read and approved the final manuscript.\n\nCompeting interests \nThe authors declare that they have no competing interests.\n\nConsent for publication \nNot applicable.\n\nEthics approval and consent to participate \nNot applicable.\n\nMaven artifacts \nSCIFIO can be added as a dependency to any project capable of consuming Maven dependencies. As SCIFIO is a project in the SciJava domain, we recommend using dependency management from the latest pom-scijava release (http:\/\/maven.imagej.net\/index.html#nexus-search;gav~org.scijava~pom-scijava). The following are example sections for adding a SCIFIO dependency to a pom.xml:\n\r\n\n\n\n\nReferences \n\n\n\u2191 Linkert, M.; Rueden, C.T.; Allan, C. et al. (2010). \"Metadata matters: Access to image data in the real world\". Journal of Cell Biology 189 (5): 777\u201382. doi:10.1083\/jcb.201004104. PMID 20513764.   \n\n\u2191 Goldberg, I.G.; Allan, C.; Burel, J.M. et al. (2005). \"The Open Microscopy Environment (OME) Data Model and XML file: Open tools for informatics and quantitative analysis in biological imaging\". Genome Biology 6 (5): R47. doi:10.1186\/gb-2005-6-5-r47. PMID 15892875.   \n\n\u2191 Warmerdam, F.; Kiseley, A.; Welles, M.; Kelly, D.. \"LibTIFF - TIFF Library and Utilities\". http:\/\/www.libtiff.org\/ . Retrieved 29 November 2016 .   \n\n\u2191 Bidgood Jr., W.D.; Horii, S.C.; Prior, F.W.; Van Syckle, D.E. (1997). \"Understanding and using DICOM, the data interchange standard for biomedical imaging\". JAMIA 4 (3): 199\u2013212. doi:10.1136\/jamia.1997.0040199. PMID 9147339.   \n\n\u2191 Pence, W.D.; Chiappetti, L.; Page, C.G. et al. (2010). \"Definition of the Flexible Image Transport System (FITS), version 3.0\". Astronomy & Astrophysics 524 (December 2010): A42. doi:10.1051\/0004-6361\/201015362.   \n\n\u2191 Unidata. \"Network Common Data Form (NetCDF)\". University Corporation for Atmospheric Research. doi:10.5065\/D6H70CW6. http:\/\/www.unidata.ucar.edu\/software\/netcdf\/ . Retrieved 29 November 2016 .   \n\n\u2191 Pietzsch, T.; Preisbisch, S.; Tomanc\u00e1k, P. et al. (2012). \"ImgLib2: Generic image processing in Java\". Bioinformatics 28 (22): 3009\u201311. doi:10.1093\/bioinformatics\/bts543. PMID 22962343.   \n\n\u2191 Schindelin, J.; Rueden, C.T.; Hiner, M.C. et al. (2015). \"The ImageJ ecosystem: An open platform for biomedical image analysis\". Molecular Reproduction and Development 82 (7\u20138): 518-29. doi:10.1002\/mrd.22489. PMID 26153368.   \n\n\u2191 Schneider, C.A.; Rasband, W.S.; Eliceiri, K.W. (2012). \"NIH Image to ImageJ: 25 years of image analysis\". Nature Methods 9 (7): 671\u20135. PMID 22930834.   \n\n\u2191 Berthold, M.R.; Cebron, N.; Dill, F.; Gabriel, T.R.; K\u00f6tter, T.; Meinl, T.; Ohl, P.; Sieb, C.; Thiel, K.; Wiswedel, B. (2008). \"Chapter 38: KNIME: The Konstanz Information Miner\". In Preisach, C.; Burkhardt, H.; Schmidt-Thieme, L.; Decker, R.. Data Analysis, Machine Learning and Applications. Springer Berlin Heidelberg. doi:10.1007\/978-3-540-78246-9_38. ISBN 9783540782391.   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats\">https:\/\/www.limswiki.org\/index.php\/Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2017)LIMSwiki journal articles (all)LIMSwiki journal articles on bioinformaticsLIMSwiki journal articles on software\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 24 January 2017, at 22:10.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 1,375 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","e28b66162eedd9f2c9137d1be8322cec_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_SCIFIO_An_extensible_framework_to_support_scientific_image_formats skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:SCIFIO: An extensible framework to support scientific image formats<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p><b>Background<\/b>: No gold standard exists in the world of scientific image acquisition; a proliferation of instruments each with its own proprietary data format has made out-of-the-box sharing of that data nearly impossible. In the field of light microscopy, the Bio-Formats library was designed to translate such proprietary data formats to a common, open-source schema, enabling sharing and reproduction of scientific results. While Bio-Formats has proved successful for microscopy images, the greater scientific community was lacking a domain-independent framework for format translation.\n<\/p><p><b>Results<\/b>: SCIFIO (SCientific Image Format Input and Output) is presented as a freely available, open-source library unifying the mechanisms of reading and writing image data. The core of SCIFIO is its modular definition of formats, the design of which clearly outlines the components of image I\/O to encourage extensibility, facilitated by the dynamic discovery of the SciJava plugin framework. SCIFIO is structured to support coexistence of multiple domain-specific open exchange formats, such as Bio-Formats\u2019 OME-TIFF, within a unified environment.\n<\/p><p><b>Conclusions<\/b>: SCIFIO is a freely available software library developed to standardize the process of reading and writing scientific image formats.\n<\/p><p><b>Keywords<\/b>: SCIFIO, image analysis, open-source, Bio-Formats, ImageJ\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Background\">Background<\/span><\/h2>\n<p>Image formats are defined by the logical layout of metadata and pixel information across one or more data sources. Proprietary file formats (PFFs) are created when an <a href=\"https:\/\/www.limswiki.org\/index.php\/Imaging\" title=\"Imaging\" class=\"mw-disambig wiki-link\" target=\"_blank\" data-key=\"c99dd47b045eb67ecc822556afcbda57\">imaging<\/a> instrument, such as a <a href=\"https:\/\/www.limswiki.org\/index.php\/Microscope\" title=\"Microscope\" target=\"_blank\" class=\"wiki-link\" data-key=\"88edff09f2745648524350d3f7be8354\">microscope<\/a>, records such data in a structure that is not publicly described. PFFs are especially problematic in scientific domains, as each company or even instrument brings the potential for a new file format, possibly requiring licensed software to decode, or the file format changing in structure without notice or recourse. The scientific method necessitates that data can be analyzed by others to verify and reproduce results; when said data is stored in a proprietary format, by definition, it cannot be freely shared and inspected.\n<\/p><p>In response to the proliferation of PFFs in the fields of life science, the Open Microscopy Environment (OME) consortium developed the Bio-Formats library to standardize the reading of microscopy data.<sup id=\"rdp-ebb-cite_ref-LinkertMetadata10_1-0\" class=\"reference\"><a href=\"#cite_note-LinkertMetadata10-1\" rel=\"external_link\">[1]<\/a><\/sup> Bio-Formats provides an <a href=\"https:\/\/www.limswiki.org\/index.php\/Application_programming_interface\" title=\"Application programming interface\" target=\"_blank\" class=\"wiki-link\" data-key=\"36fc319869eba4613cb0854b421b0934\">application programming interface<\/a> (API) for reading and writing images, backed by a comprehensive collection of extensions to decode format-specific <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" target=\"_blank\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a> and translate it into an open specification called the OME data model.<sup id=\"rdp-ebb-cite_ref-GoldbergTheOpen05_2-0\" class=\"reference\"><a href=\"#cite_note-GoldbergTheOpen05-2\" rel=\"external_link\">[2]<\/a><\/sup> A translated image can then be written as OME-TIFF, an \u201copen-exchange format\u201d which combines the universal readability of the TIFF standard with an <a href=\"https:\/\/www.limswiki.org\/index.php\/XML\" title=\"XML\" class=\"mw-redirect wiki-link\" target=\"_blank\" data-key=\"fda82e3b4db7e4b2856b016933a1d2d1\">XML<\/a> schema representing the OME data model (OME-XML). These OME-TIFF images can be freely shared, with pixel data accessible via standard libraries such as libtiff<sup id=\"rdp-ebb-cite_ref-LibTIFF_3-0\" class=\"reference\"><a href=\"#cite_note-LibTIFF-3\" rel=\"external_link\">[3]<\/a><\/sup>, and the complete metadata parseable by any standards-compliant XML reader. In this way, the Bio-Formats project greatly mitigates the PFF problem in microscopy.\n<\/p><p>Bio-Formats has become an essential tool for scientists worldwide; however, its metadata model specifically targets 5-dimensional images in microscopy and related life sciences disciplines. PFFs from other scientific domains \u2014 e.g., <a href=\"https:\/\/www.limswiki.org\/index.php\/Medical_imaging\" title=\"Medical imaging\" target=\"_blank\" class=\"wiki-link\" data-key=\"dddd7e2b5706415d7af0375386e6eafa\">medical imaging<\/a>, astronomy, industrial x-rays, materials science and geoscience \u2014 each have their own unique considerations with respect to the dimensionality and metadata of their images; as such, it would be infeasible for a single \u201cone-size-fits-all\u201d metadata model to fully address the needs of scientific imaging as a whole. With this conclusion in mind, we have developed the SCIFIO (SCientific Image Format Input and Output) library, generalizing the success of Bio-Formats to create a domain-independent image I\/O framework enabling seamless and extensible translation between image metadata models. The goal of SCIFIO is to provide the architecture that will equally facilitate: 1) the conversion of additional formats into supported open-exchange formats such as OME-TIFF and 2) the integration of additional scientific open-exchange formats such as <a href=\"https:\/\/www.limswiki.org\/index.php\/DICOM\" title=\"DICOM\" target=\"_blank\" class=\"wiki-link\" data-key=\"f0c7c747895286ff8785b6ed4dbc7ec0\">Digital Imaging and Communications in Medicine<\/a> (DICOM)<sup id=\"rdp-ebb-cite_ref-BidgoodUnder97_4-0\" class=\"reference\"><a href=\"#cite_note-BidgoodUnder97-4\" rel=\"external_link\">[4]<\/a><\/sup>, Flexible Image Transport System (FITS)<sup id=\"rdp-ebb-cite_ref-PenceDef10_5-0\" class=\"reference\"><a href=\"#cite_note-PenceDef10-5\" rel=\"external_link\">[5]<\/a><\/sup> and NetCDF<sup id=\"rdp-ebb-cite_ref-UDNetCDF16_6-0\" class=\"reference\"><a href=\"#cite_note-UDNetCDF16-6\" rel=\"external_link\">[6]<\/a><\/sup> into a common image I\/O framework.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Implementation\">Implementation<\/span><\/h2>\n<p>SCIFIO is implemented as a plugin suite for the SciJava plugin framework. Its core is written under the permissive BSD license to maximize freedom of inclusion in both open- and closed-source applications. The SciJava framework collects <i>Plugins<\/i> in an application <i>Context<\/i> which are typically accessed via <i>Services<\/i>. As such, SCIFIO defines a collection of <i>Plugins<\/i> and <i>Services<\/i> facilitating image I\/O. Developers will typically start with the <i>SCIFIO<\/i> class itself: a <i>Gateway<\/i> to the SciJava <i>Context<\/i> providing convenient access methods for functional components of the SCIFIO framework.\n<\/p><p>The SciJava framework sorts <i>Plugins<\/i> by \u201ctype,\u201d representing the role of a given <i>Plugin<\/i>. Extensibility and flexibility is achieved by providing a public <i>Service<\/i> API which organizes and delegates to available <i>Plugins<\/i> of each type. Thus, SCIFIO development is primarily concerned with adding new <i>Plugin<\/i> implementations to achieve a desired result. The following sections describe the key <i>Plugin<\/i> types in SCIFIO, and the behavior they control.\n<\/p><p>First and foremost is the <i>Format<\/i>. <i>Formats<\/i> are a collection of interface-driven components (Fig. 1) defining the steps for decoding an image source to its metadata and pixel values. In SCIFIO, the <a href=\"https:\/\/www.limswiki.org\/index.php\/ImageJ\" title=\"ImageJ\" target=\"_blank\" class=\"wiki-link\" data-key=\"0f8d592a8b8e6e03f5bbea6f41897b7f\">ImageJ<\/a> Common data model is used to describe pixels; this data model is built on ImgLib2<sup id=\"rdp-ebb-cite_ref-PietzschImgLib12_7-0\" class=\"reference\"><a href=\"#cite_note-PietzschImgLib12-7\" rel=\"external_link\">[7]<\/a><\/sup> due to its type and algorithmic flexibility, ensuring images opened with SCIFIO are universally recognized within the ImageJ ecosystem.<sup id=\"rdp-ebb-cite_ref-SchindelinTheImageJ15_8-0\" class=\"reference\"><a href=\"#cite_note-SchindelinTheImageJ15-8\" rel=\"external_link\">[8]<\/a><\/sup> A <i>Format<\/i> must always include a <i>Metadata<\/i> component defining its unique fields and structures, such as acquisition instrument details, dimensional axis types, or detector emission wavelengths. Each Metadata implementation must also be able to express itself as a standard format-independent <i>ImageMetadata<\/i> object, establishing a common baseline for use within the framework.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Hiner_BMCBioinformatics2016_17.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"349ac1c0200b1385e4e4b0252dfb2ea6\"><img alt=\"Fig1 Hiner BMCBioinformatics2016 17.gif\" src=\"https:\/\/www.limswiki.org\/images\/6\/67\/Fig1_Hiner_BMCBioinformatics2016_17.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> Components of a <i>Format<\/i> plugin and their role in image I\/O<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The <i>Checker<\/i> component contains the logic for matching a given <i>Format<\/i> with a potential image source, while the Parser component performs the actual creation of <i>Metadata<\/i> from that source. The <i>Reader<\/i> and <i>Writer<\/i> components use <i>Metadata<\/i> to read or write pixel data, respectively. Given the goal of freely shareable image data, <i>Writers<\/i> are optional components and should not be implemented for proprietary formats.\n<\/p><p>A second essential <i>Plugin<\/i> type is the <i>Translator<\/i>, which encodes logic for conversion from one <i>Metadata<\/i> type to another. <i>Translators<\/i> enable the standardization of proprietary formats to common <i>Metadata<\/i> structures such as OME, and hence play a key role in converting images between <i>Formats<\/i>. <i>Translators<\/i> are typically created to accompany <i>Writers<\/i>, ensuring <i>Format<\/i>-specific metadata is properly populated. Additionally, the <i>Translator<\/i> framework enables the integration of new open-exchange formats via <i>Translator<\/i>-only libraries, converting supported <i>Metadata<\/i> types to the new standard. An example of this model can be seen in the SCIFIO-OME-XML component (Fig. 2).\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Hiner_BMCBioinformatics2016_17.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"2425979d951da3327c91d7b3b157bfb1\"><img alt=\"Fig2 Hiner BMCBioinformatics2016 17.gif\" src=\"https:\/\/www.limswiki.org\/images\/2\/2d\/Fig2_Hiner_BMCBioinformatics2016_17.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> SCIFIO-OME-XML <i>Translator<\/i> suite, for converting metadata to the OME-TIFF open-exchange format<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>While <i>Formats<\/i> and <i>Translators<\/i> add new behavior to the base framework, SCIFIO also has <i>Plugin<\/i> types to control existing behavior. For example, <i>Filter<\/i> plugins provide a <i>Format<\/i>-agnostic mechanism for modifying <i>Reader<\/i> behavior. <i>Filters<\/i> create an ordered chain of delegation, each operating on the data of its parent, and can be individually toggled \u2018on\u2019 or \u2018off\u2019 on a per-<i>Reader<\/i> basis. Sample <i>Filter<\/i> stacking behavior is illustrated in a <i>ChannelFiller<\/i> for converting \u201cindexed color\u201d pixels to RGB values and a <i>FileStitcher<\/i> for unifying multiple files on disk to form one dataset (Fig. 3).\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Hiner_BMCBioinformatics2016_17.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"f7d3a908e8b2198311e02226969c6cbc\"><img alt=\"Fig3 Hiner BMCBioinformatics2016 17.gif\" src=\"https:\/\/www.limswiki.org\/images\/1\/1c\/Fig3_Hiner_BMCBioinformatics2016_17.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 3.<\/b> Behavior of <i>ChannelFiller<\/i> and <i>FileStitcher Filter<\/i> plugins<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>With all SciJava <i>Plugins<\/i>, a numeric priority value attached to each class creates an implicit relative ordering for operations \u2014 e.g., order of <i>Checker<\/i> querying, <i>Translator<\/i> querying, or <i>Filter<\/i> application. Priorities are automatically considered when using the SCIFIO <i>Services<\/i>: from the <i>FormatService<\/i> polling <i>Checker<\/i> components to the <i>TranslatorService<\/i> finding the correct <i>Translator<\/i> for a given request, priorities allow querying the most specific solutions first, before moving to more general options. These pieces together provide a robust and flexible library for reading and writing image data.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Results_and_discussion\">Results and discussion<\/span><\/h2>\n<p>As the fundamental goal of SCIFIO is to establish an extensible framework for image support, the SciJava framework is a logical choice for implementation. SciJava provides extensible solutions to common software problems, which implicitly benefit SCIFIO. A core example is the extensible script language framework (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/imagej.net\/Scripting\" target=\"_blank\">http:\/\/imagej.net\/Scripting<\/a>) which effectively allows SCIFIO to be used from any number of programming languages without requiring language-specific considerations in SCIFIO itself.\n<\/p><p>ImageJ<sup id=\"rdp-ebb-cite_ref-SchneiderNIH12_9-0\" class=\"reference\"><a href=\"#cite_note-SchneiderNIH12-9\" rel=\"external_link\">[9]<\/a><\/sup> presents the flagship use case for SCIFIO, allowing an established community to vet and refine the library. Although users do not directly interact with SCIFIO API, all image I\/O operations in ImageJ ultimately rely on SCIFIO. As developers contribute new Format plugins for image types relevant to their work, any application using SCIFIO can immediately benefit from the new plugin. Looking beyond ImageJ, projects like KNIME Image Processing (KNIP), built on the <a href=\"https:\/\/www.limswiki.org\/index.php\/KNIME\" title=\"KNIME\" target=\"_blank\" class=\"wiki-link\" data-key=\"18360a3b4b22798e231d7b26365dff87\">KNIME<\/a> Analytics Platform<sup id=\"rdp-ebb-cite_ref-BertholdKNIME08_10-0\" class=\"reference\"><a href=\"#cite_note-BertholdKNIME08-10\" rel=\"external_link\">[10]<\/a><\/sup>, have already adopted SCIFIO for their image I\/O mechanism. This sort of code sharing leads to a form of mutualistic collaboration: a new <i>Format<\/i> plugin developed for KNIP will automatically work in ImageJ, with the converse true as well. Equally importantly, both ImageJ and KNIP can implicitly operate on image data produced by the other program, laying the foundation for algorithmic interoperability.\n<\/p><p>Collaborations like this would not be possible with a focused library like Bio-Formats. KNIME is a platform for extensible workflows, thus its handling of image data demands flexibility beyond the fixed 5D microscopy schema of OME. Additionally, Bio-Formats\u2019 mechanism of format extension requires either modification of a text-based configuration file to define format priority, which can lead to conflicts if multiple libraries provide differing versions of this file, or runtime modification by API calls, which may not be reproducible without a central mechanism controlling these calls. Conversely, the dynamic discovery of the SciJava plugin framework allows SCIFIO developers to provide their <i>Formats<\/i> completely independently \u2014 e.g., on an ImageJ, KNIME or Eclipse update site, while SCIFIO\u2019s backing by the ImageJ Common data model ensures adaptation to any future requirements in imaging dimensionality and data types.\n<\/p><p>Bio-Formats readers and writers and SCIFIO <i>Format<\/i> components define similar high level logic, but in Bio-Formats several I\/O steps are conflated in a single monolithic interface with many protected methods as potential extension points. SCIFIO encapsulates each I\/O step into its own dedicated component, to minimize the effort required in format development. Whether a format is added to Bio-Formats or SCIFIO libraries; the SCIFIO-BF-Compat and SCIFIO-OME-XML components offer bidirectional compatibility between SCIFIO and Bio-Formats.\n<\/p><p>Bio-Formats has demonstrated the feasibility of standardizing a broad field of PFFs into a common open-exchange format. SCIFIO provides a natural generalization of thinking, allowing extension to new domains, through the integration of their <i>Metadata<\/i> standards and open-exchange formats via <i>Translators<\/i>, and clear paths for contributing to existing domains by encapsulating the logic of <i>Format<\/i> components. Given the added immediate power of the Bio-Formats integration layers, we see the SCIFIO framework as a potential unifying solution to PFFs in scientific image data.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusions\">Conclusions<\/span><\/h2>\n<p>SCIFIO is an open-source library generalizing the successful structure of Bio-Formats to create a domain-independent framework for the reading, writing, and translation of images. The extensible design of SCIFIO facilitates community contribution, the establishment of domain-specific metadata standards, and integration into a unified system capable of adapting to the demands of scientific imaging analysis.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Abbreviations\">Abbreviations<\/span><\/h2>\n<p><b>API<\/b>: Application program interface\n<\/p><p><b>I\/O<\/b>: Input and\/or output\n<\/p><p><b>KNIME<\/b>: Konstanz Information Miner\n<\/p><p><b>KNIP<\/b>: KNIME Image Processing\n<\/p><p><b>OME<\/b>: Open Microscopy Environment\n<\/p><p><b>PFF<\/b>: Proprietary file formats\n<\/p><p><b>SCIFIO<\/b>: SCientific Image Format Input and Output\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Declarations\">Declarations<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h3>\n<p>Many people have contributed to the development of SCIFIO on both technical and leadership levels. In particular, the authors gratefully thank and acknowledge the efforts of (in alphabetical order): Ellen T. Arena, Anne Carpenter, Christian Dietz, Gabriel Einsdorf, Melissa Linkert, Josh Moore, Tobias Pietzsch, Stephan Preibisch, Stephan Saalfeld, Jason Swedlow, and Pavel Tomancak. We also thank the entire ImageJ community, especially those who contributed patch submissions, use cases, feature requests and bug reports.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h4>\n<p>Research reported in this publication was supported by ACI Division of Advanced Cyberinfrastructure of the National Science Foundation under award number 1148362 and additional internal funding from the Laboratory for Optical and Computational Instrumentation.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Availability_of_data_and_materials\">Availability of data and materials<\/span><\/h4>\n<p><b>Project name<\/b>: SCIFIO\n<\/p><p><b>Project home page<\/b>: <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/scif.io\/\" target=\"_blank\">http:\/\/scif.io\/<\/a>\n<\/p><p><b>Archived version<\/b>: 0.28.2 \n<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/maven.imagej.net\/service\/local\/repositories\/releases\/content\/io\/scif\/scifio\/0.28.2\/scifio-0.28.2.jar\" target=\"_blank\">http:\/\/maven.imagej.net\/service\/local\/repositories\/releases\/content\/io\/scif\/scifio\/0.28.2\/scifio-0.28.2.jar<\/a>\n<\/p><p><b>Source code<\/b>: <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/github.com\/scifio\/scifio\" target=\"_blank\">https:\/\/github.com\/scifio\/scifio<\/a>\n<\/p><p><b>Operating system(s)<\/b>: Platform-independent\n<\/p><p><b>Programming language<\/b>: Java\n<\/p><p><b>Other requirements<\/b>: Java 1.8 or higher runtime, io.scif:scifio-jai-imageio, net.imagej:imagej-common, net.imglib2:imglib2, org.scijava:scijava-common, org.mapdb:mapdb\n<\/p><p><b>License<\/b>: BSD\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Authors.E2.80.99_contributions\">Authors\u2019 contributions<\/span><\/h4>\n<p>MCH was the lead implementer of the software. CTR architected the underlying SciJava foundation and guided SCIFIO development. As the primary principal investigator of SCIFIO, KWE directed and advised on all aspects of the project including development directions and priorities. All authors contributed to, read and approved the final manuscript.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h4>\n<p>The authors declare that they have no competing interests.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Consent_for_publication\">Consent for publication<\/span><\/h4>\n<p>Not applicable.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Ethics_approval_and_consent_to_participate\">Ethics approval and consent to participate<\/span><\/h4>\n<p>Not applicable.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Maven_artifacts\">Maven artifacts<\/span><\/h4>\n<p>SCIFIO can be added as a dependency to any project capable of consuming Maven dependencies. As SCIFIO is a project in the SciJava domain, we recommend using dependency management from the latest pom-scijava release (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/maven.imagej.net\/index.html#nexus-search;gav~org.scijava~pom-scijava\" target=\"_blank\">http:\/\/maven.imagej.net\/index.html#nexus-search;gav~org.scijava~pom-scijava<\/a>). The following are example sections for adding a SCIFIO dependency to a pom.xml:\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig4_Hiner_BMCBioinformatics2016_17.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"0b65588e7d2ee2145e329c1a61951c5f\"><img alt=\"Fig4 Hiner BMCBioinformatics2016 17.gif\" src=\"https:\/\/www.limswiki.org\/images\/d\/d4\/Fig4_Hiner_BMCBioinformatics2016_17.gif\" width=\"379\" height=\"227\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-LinkertMetadata10-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LinkertMetadata10_1-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Linkert, M.; Rueden, C.T.; Allan, C. et al. (2010). \"Metadata matters: Access to image data in the real world\". <i>Journal of Cell Biology<\/i> <b>189<\/b> (5): 777\u201382. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1083%2Fjcb.201004104\" target=\"_blank\">10.1083\/jcb.201004104<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/20513764\" target=\"_blank\">20513764<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Metadata+matters%3A+Access+to+image+data+in+the+real+world&rft.jtitle=Journal+of+Cell+Biology&rft.aulast=Linkert%2C+M.%3B+Rueden%2C+C.T.%3B+Allan%2C+C.+et+al.&rft.au=Linkert%2C+M.%3B+Rueden%2C+C.T.%3B+Allan%2C+C.+et+al.&rft.date=2010&rft.volume=189&rft.issue=5&rft.pages=777%E2%80%9382&rft_id=info:doi\/10.1083%2Fjcb.201004104&rft_id=info:pmid\/20513764&rfr_id=info:sid\/en.wikipedia.org:Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GoldbergTheOpen05-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GoldbergTheOpen05_2-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Goldberg, I.G.; Allan, C.; Burel, J.M. et al. (2005). \"The Open Microscopy Environment (OME) Data Model and XML file: Open tools for informatics and quantitative analysis in biological imaging\". <i>Genome Biology<\/i> <b>6<\/b> (5): R47. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1186%2Fgb-2005-6-5-r47\" target=\"_blank\">10.1186\/gb-2005-6-5-r47<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/15892875\" target=\"_blank\">15892875<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Open+Microscopy+Environment+%28OME%29+Data+Model+and+XML+file%3A+Open+tools+for+informatics+and+quantitative+analysis+in+biological+imaging&rft.jtitle=Genome+Biology&rft.aulast=Goldberg%2C+I.G.%3B+Allan%2C+C.%3B+Burel%2C+J.M.+et+al.&rft.au=Goldberg%2C+I.G.%3B+Allan%2C+C.%3B+Burel%2C+J.M.+et+al.&rft.date=2005&rft.volume=6&rft.issue=5&rft.pages=R47&rft_id=info:doi\/10.1186%2Fgb-2005-6-5-r47&rft_id=info:pmid\/15892875&rfr_id=info:sid\/en.wikipedia.org:Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LibTIFF-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LibTIFF_3-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Warmerdam, F.; Kiseley, A.; Welles, M.; Kelly, D.. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.libtiff.org\/\" target=\"_blank\">\"LibTIFF - TIFF Library and Utilities\"<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.libtiff.org\/\" target=\"_blank\">http:\/\/www.libtiff.org\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 29 November 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=LibTIFF+-+TIFF+Library+and+Utilities&rft.atitle=&rft.aulast=Warmerdam%2C+F.%3B+Kiseley%2C+A.%3B+Welles%2C+M.%3B+Kelly%2C+D.&rft.au=Warmerdam%2C+F.%3B+Kiseley%2C+A.%3B+Welles%2C+M.%3B+Kelly%2C+D.&rft_id=http%3A%2F%2Fwww.libtiff.org%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BidgoodUnder97-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BidgoodUnder97_4-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Bidgood Jr., W.D.; Horii, S.C.; Prior, F.W.; Van Syckle, D.E. (1997). \"Understanding and using DICOM, the data interchange standard for biomedical imaging\". <i>JAMIA<\/i> <b>4<\/b> (3): 199\u2013212. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1136%2Fjamia.1997.0040199\" target=\"_blank\">10.1136\/jamia.1997.0040199<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/9147339\" target=\"_blank\">9147339<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Understanding+and+using+DICOM%2C+the+data+interchange+standard+for+biomedical+imaging&rft.jtitle=JAMIA&rft.aulast=Bidgood+Jr.%2C+W.D.%3B+Horii%2C+S.C.%3B+Prior%2C+F.W.%3B+Van+Syckle%2C+D.E.&rft.au=Bidgood+Jr.%2C+W.D.%3B+Horii%2C+S.C.%3B+Prior%2C+F.W.%3B+Van+Syckle%2C+D.E.&rft.date=1997&rft.volume=4&rft.issue=3&rft.pages=199%E2%80%93212&rft_id=info:doi\/10.1136%2Fjamia.1997.0040199&rft_id=info:pmid\/9147339&rfr_id=info:sid\/en.wikipedia.org:Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PenceDef10-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PenceDef10_5-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Pence, W.D.; Chiappetti, L.; Page, C.G. et al. (2010). \"Definition of the Flexible Image Transport System (FITS), version 3.0\". <i>Astronomy & Astrophysics<\/i> <b>524<\/b> (December 2010): A42. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1051%2F0004-6361%2F201015362\" target=\"_blank\">10.1051\/0004-6361\/201015362<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Definition+of+the+Flexible+Image+Transport+System+%28FITS%29%2C+version+3.0&rft.jtitle=Astronomy+%26+Astrophysics&rft.aulast=Pence%2C+W.D.%3B+Chiappetti%2C+L.%3B+Page%2C+C.G.+et+al.&rft.au=Pence%2C+W.D.%3B+Chiappetti%2C+L.%3B+Page%2C+C.G.+et+al.&rft.date=2010&rft.volume=524&rft.issue=December+2010&rft.pages=A42&rft_id=info:doi\/10.1051%2F0004-6361%2F201015362&rfr_id=info:sid\/en.wikipedia.org:Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-UDNetCDF16-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-UDNetCDF16_6-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Unidata. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.unidata.ucar.edu\/software\/netcdf\/\" target=\"_blank\">\"Network Common Data Form (NetCDF)\"<\/a>. University Corporation for Atmospheric Research. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.5065%2FD6H70CW6\" target=\"_blank\">10.5065\/D6H70CW6<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.unidata.ucar.edu\/software\/netcdf\/\" target=\"_blank\">http:\/\/www.unidata.ucar.edu\/software\/netcdf\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 29 November 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Network+Common+Data+Form+%28NetCDF%29&rft.atitle=&rft.aulast=Unidata&rft.au=Unidata&rft.pub=University+Corporation+for+Atmospheric+Research&rft_id=info:doi\/10.5065%2FD6H70CW6&rft_id=http%3A%2F%2Fwww.unidata.ucar.edu%2Fsoftware%2Fnetcdf%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PietzschImgLib12-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PietzschImgLib12_7-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Pietzsch, T.; Preisbisch, S.; Tomanc\u00e1k, P. et al. (2012). \"ImgLib2: Generic image processing in Java\". <i>Bioinformatics<\/i> <b>28<\/b> (22): 3009\u201311. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1093%2Fbioinformatics%2Fbts543\" target=\"_blank\">10.1093\/bioinformatics\/bts543<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22962343\" target=\"_blank\">22962343<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ImgLib2%3A+Generic+image+processing+in+Java&rft.jtitle=Bioinformatics&rft.aulast=Pietzsch%2C+T.%3B+Preisbisch%2C+S.%3B+Tomanc%C3%A1k%2C+P.+et+al.&rft.au=Pietzsch%2C+T.%3B+Preisbisch%2C+S.%3B+Tomanc%C3%A1k%2C+P.+et+al.&rft.date=2012&rft.volume=28&rft.issue=22&rft.pages=3009%E2%80%9311&rft_id=info:doi\/10.1093%2Fbioinformatics%2Fbts543&rft_id=info:pmid\/22962343&rfr_id=info:sid\/en.wikipedia.org:Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SchindelinTheImageJ15-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SchindelinTheImageJ15_8-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Schindelin, J.; Rueden, C.T.; Hiner, M.C. et al. (2015). \"The ImageJ ecosystem: An open platform for biomedical image analysis\". <i>Molecular Reproduction and Development<\/i> <b>82<\/b> (7\u20138): 518-29. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1002%2Fmrd.22489\" target=\"_blank\">10.1002\/mrd.22489<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26153368\" target=\"_blank\">26153368<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+ImageJ+ecosystem%3A+An+open+platform+for+biomedical+image+analysis&rft.jtitle=Molecular+Reproduction+and+Development&rft.aulast=Schindelin%2C+J.%3B+Rueden%2C+C.T.%3B+Hiner%2C+M.C.+et+al.&rft.au=Schindelin%2C+J.%3B+Rueden%2C+C.T.%3B+Hiner%2C+M.C.+et+al.&rft.date=2015&rft.volume=82&rft.issue=7%E2%80%938&rft.pages=518-29&rft_id=info:doi\/10.1002%2Fmrd.22489&rft_id=info:pmid\/26153368&rfr_id=info:sid\/en.wikipedia.org:Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SchneiderNIH12-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SchneiderNIH12_9-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Schneider, C.A.; Rasband, W.S.; Eliceiri, K.W. (2012). \"NIH Image to ImageJ: 25 years of image analysis\". <i>Nature Methods<\/i> <b>9<\/b> (7): 671\u20135. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22930834\" target=\"_blank\">22930834<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=NIH+Image+to+ImageJ%3A+25+years+of+image+analysis&rft.jtitle=Nature+Methods&rft.aulast=Schneider%2C+C.A.%3B+Rasband%2C+W.S.%3B+Eliceiri%2C+K.W.&rft.au=Schneider%2C+C.A.%3B+Rasband%2C+W.S.%3B+Eliceiri%2C+K.W.&rft.date=2012&rft.volume=9&rft.issue=7&rft.pages=671%E2%80%935&rft_id=info:pmid\/22930834&rfr_id=info:sid\/en.wikipedia.org:Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BertholdKNIME08-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BertholdKNIME08_10-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Berthold, M.R.; Cebron, N.; Dill, F.; Gabriel, T.R.; K\u00f6tter, T.; Meinl, T.; Ohl, P.; Sieb, C.; Thiel, K.; Wiswedel, B. (2008). \"Chapter 38: KNIME: The Konstanz Information Miner\". In Preisach, C.; Burkhardt, H.; Schmidt-Thieme, L.; Decker, R.. <i>Data Analysis, Machine Learning and Applications<\/i>. Springer Berlin Heidelberg. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-540-78246-9_38\" target=\"_blank\">10.1007\/978-3-540-78246-9_38<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9783540782391.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Chapter+38%3A+KNIME%3A+The+Konstanz+Information+Miner&rft.atitle=Data+Analysis%2C+Machine+Learning+and+Applications&rft.aulast=Berthold%2C+M.R.%3B+Cebron%2C+N.%3B+Dill%2C+F.%3B+Gabriel%2C+T.R.%3B+K%C3%B6tter%2C+T.%3B+Meinl%2C+T.%3B+Ohl%2C+P.%3B+Sieb%2C+C.%3B+Thiel%2C+K.%3B+Wiswedel%2C+B.&rft.au=Berthold%2C+M.R.%3B+Cebron%2C+N.%3B+Dill%2C+F.%3B+Gabriel%2C+T.R.%3B+K%C3%B6tter%2C+T.%3B+Meinl%2C+T.%3B+Ohl%2C+P.%3B+Sieb%2C+C.%3B+Thiel%2C+K.%3B+Wiswedel%2C+B.&rft.date=2008&rft.pub=Springer+Berlin+Heidelberg&rft_id=info:doi\/10.1007%2F978-3-540-78246-9_38&rft.isbn=9783540782391&rfr_id=info:sid\/en.wikipedia.org:Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20181214191100\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.300 seconds\nReal time usage: 0.329 seconds\nPreprocessor visited node count: 9294\/1000000\nPreprocessor generated node count: 32992\/1000000\nPost\u2010expand include size: 65057\/2097152 bytes\nTemplate argument size: 21079\/2097152 bytes\nHighest expansion depth: 15\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 299.127 1 - -total\n 74.21% 221.982 1 - Template:Reflist\n 61.15% 182.930 10 - Template:Citation\/core\n 48.91% 146.307 7 - Template:Cite_journal\n 20.49% 61.277 1 - Template:Infobox_journal_article\n 19.72% 59.002 1 - Template:Infobox\n 11.79% 35.256 80 - Template:Infobox\/row\n 11.14% 33.314 2 - Template:Cite_web\n 8.06% 24.095 15 - Template:Citation\/identifier\n 7.87% 23.529 1 - Template:Cite_book\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:9931-0!*!0!!en!5!* and timestamp 20181214191059 and revision id 29181\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats\">https:\/\/www.limswiki.org\/index.php\/Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","e28b66162eedd9f2c9137d1be8322cec_images":["https:\/\/www.limswiki.org\/images\/6\/67\/Fig1_Hiner_BMCBioinformatics2016_17.gif","https:\/\/www.limswiki.org\/images\/2\/2d\/Fig2_Hiner_BMCBioinformatics2016_17.gif","https:\/\/www.limswiki.org\/images\/1\/1c\/Fig3_Hiner_BMCBioinformatics2016_17.gif","https:\/\/www.limswiki.org\/images\/d\/d4\/Fig4_Hiner_BMCBioinformatics2016_17.gif"],"e28b66162eedd9f2c9137d1be8322cec_timestamp":1544814659,"ffcad3b9d842250ab55f35eb0cee8237_type":"article","ffcad3b9d842250ab55f35eb0cee8237_title":"PCM-SABRE: A platform for benchmarking and comparing outcome prediction methods in precision cancer medicine (Eyal-Altman et al. 2017)","ffcad3b9d842250ab55f35eb0cee8237_url":"https:\/\/www.limswiki.org\/index.php\/Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine","ffcad3b9d842250ab55f35eb0cee8237_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:PCM-SABRE: A platform for benchmarking and comparing outcome prediction methods in precision cancer medicine\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nPCM-SABRE: A platform for benchmarking and comparing outcome\r\nprediction methods in precision cancer medicineJournal\n \nBMC BioinformaticsAuthor(s)\n \nEyal-Altman, Noah; Last, Mark; Rubin, EitanAuthor affiliation(s)\n \nBen-Gurion University of the NegevPrimary contact\n \nEmail: eyalnoa at post dot bgu dor ac dot ilYear published\n \n2017Volume and issue\n \n18Page(s)\n \n40DOI\n \n10.1186\/s12859-016-1435-5ISSN\n \n1471-2105Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttp:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-016-1435-5Download\n \nhttp:\/\/bmcbioinformatics.biomedcentral.com\/track\/pdf\/10.1186\/s12859-016-1435-5 (PDF)\n\nContents\n\n1 Abstract \n2 Background \n3 Implementation \n4 Results \n\n4.1 Using PCM-SABRE for replicating a previous work that utilizes machine learning to induce outcome prediction models \n4.2 Using PCM-SABRE for optimizing and improving breast cancer outcome prediction \n\n\n5 Discussion \n6 Conclusions \n7 Abbreviations \n8 Declarations \n\n8.1 Acknowledgements \n\n8.1.1 Funding \n8.1.2 Availability of data and materials \n8.1.3 Authors\u2019 contributions \n8.1.4 Competing interests \n\n\n\n\n9 Additional files \n10 References \n11 Notes \n\n\n\nAbstract \nBackground: Numerous publications attempt to predict cancer survival outcome from gene expression data using machine-learning methods. A direct comparison of these works is challenging for the following reasons: (1) inconsistent measures used to evaluate the performance of different models, and (2) incomplete specification of critical stages in the process of knowledge discovery. There is a need for a platform that would allow researchers to replicate previous works and to test the impact of changes in the knowledge discovery process on the accuracy of the induced models.\nResults: We developed the PCM-SABRE platform, which supports the entire knowledge discovery process for cancer outcome analysis. PCM-SABRE was developed using KNIME. By using PCM-SABRE to reproduce the results of previously published works on breast cancer survival, we define a baseline for evaluating future attempts to predict cancer outcome with machine learning. We used PCM-SABRE to replicate previous work that describes predictive models of breast cancer recurrence, and tested the performance of all possible combinations of feature selection methods and data mining algorithms that was used in either of the works. We reconstructed the work of Chou et al. observing similar trends \u2013 superior performance of Probabilistic Neural Network (PNN) and logistic regression (LR) algorithms and inconclusive impact of feature pre-selection with the decision tree algorithm on subsequent analysis.\nConclusions: PCM-SABRE is a software tool that provides an intuitive environment for rapid development of predictive models in cancer precision medicine.\nKeywords: Breast cancer, data mining, reproducible research\n\nBackground \nPredicting the outcome of cancer from gene expression data is a clinically important, computationally challenging task. For example, early-stage, estrogen-receptor-positive, HER2-negative breast cancer patients that are considered to be at low risk for recurrence can avoid chemotherapy, while patients at high or intermediate risk are treated with aggressive (and harmful) chemotherapy.[1]\nEfforts to stratify patients by risk of recurrence in other tumor types, and the ability to stratify patients by overall chances of survival are not as advanced. Moreover, the relative success in risk stratification for breast cancer patients has been challenged[2], proposing that it in fact stratifies patients into tumor subtypes, which can be achieved with much simpler tests.\nAs a result, a large number of papers have been published and are still being published where gene expression data is analyzed in order to construct models that predict cancer survival or cancer recurrence. Much of these efforts are concentrated on breast cancer, the second most commonly diagnosed cancer among American women (besides skin cancer).[3] About 1 in 8 U.S. women (about 12 percent) will develop invasive breast cancer over the course of her lifetime, and similar rates are reported worldwide.[4] Breast cancer is an attractive domain for risk stratification as it is estimated that resection is a sufficient treatment for 70 to 80 percent of the patients, while the remaining patients will develop advanced metastatic lesions, which are largely impossible to cure.[5] Aggressive chemotherapy will reduce the chance of advanced metastasis for those patients in that situation, though it would be harmful and unnecessary therapy for those who aren't. Thus, great efforts have been invested in stratifying patients\u2019 risk of recurrence.[6]\nDue to the importance of risk stratification in breast cancer, combined with its relatively high abundance, breast cancer is the type of tumor for which expression profiles of newly diagnosed patients are most abundant. Several works have been published that apply machine-learning techniques to this data for predicting cancer survivability.[7][8] Unfortunately, we found it quite challenging to directly compare these works for the following reasons:\n\n Incomplete specification of critical stages in the process of knowledge discovery, such as feature selection.\n Differences in the measures used to evaluate models performance. Some only provide the overall accuracy of the proposed classifier, some offer only the area under a curve (AUC), while others provide no statistical measures and only present the Kaplan-Meier charts that visualize the survival curves based on predicted classes.\n Different studies apply different inclusion\/exclusion criteria with little or no overlaps between the patients considered.\nIncomplete documentation of the analytic process is a common cause for irreproducibility of published results. We conclude that there is a need for a platform that would allow researchers to describe their analytic work in the field of risk stratification for cancer patients in a reproducible way that can be used for further investigation. Such a platform should allow the replication of previous works and methodologically evaluate the impact of alterations in one or more stages of the knowledge discovery process on its performance in the task of cancer survival prediction. Such a tool can help to understand and compare the current state of predictions for breast cancer, and if applied to new cancer types, to prevent the \"Tower of Babel\" situation that has emerged for breast cancer.\n\nImplementation \nWe developed a platform that allows replicating, comparing and improving knowledge discovery pipelines for cancer survival predictions, and demonstrate its applicability for Breast Cancer (Fig. 1). PCM-SABRE (Precision Cancer Medicine - Survival Analysis Benchmarking, Reporting and Evaluation), was developed using KNIME (Konstanz Information Miner).[9] KNIME is a modern, flexible and intuitive open-source data analytics platform that allows performing sophisticated statistics and data mining analysis to develop, among other things, predictive models. We chose KNIME since it is a popular, user-friendly software that does not require programing knowledge. Its node-based workflow structure allows easily assessing the impact of changing one knowledge discovery step (for example, data mining algorithm) on the predictive performance without changing any other steps of the workflow. Another major benefit of KNIME is the ability to create new nodes; this feature is particularly useful when a researcher is interested in integrating a new method he developed into an existing KNIME workflow.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. Screenshot of PCM-SABRE\n\n\n\nWe designed PCM-SABRE workflow according to the common steps of knowledge discovery in data. First, the user can use a supplied dataset or load a new dataset. The dataset has to be a .csv file, in the form of a table in which the rows represent the patients and the columns represent clinical data, gene expression data, or any other types of variables. The dependent variable can be binary or continuous (it will be transformed into a binary variable) and needs to represent survival time (for example, relapse-free-survival time or death time). \nThe second Meta-node is the preprocessing step, where a binary dependent variable is created and patients with missing data or censored survival information are being filtered. We chose to use a default threshold of five years in order to split the continues survival variable into HIGH (t\u2009<\u20095 years) or LOW (t\u2009\u2264\u20095 years) risk, but this threshold is an input parameter that can be changed in a way that will be explained later. Missing values imputation is performed using random forest classification that builds a model using the non-missing rows and predicts the variable value for the missing rows. The default version of PCM-SABRE allows selecting patients according to their ER status and lymph node status, but the \"Select Patients\" Meta-node is optional and can be easily modified to meet other inclusion\/exclusion criteria. The third Meta-node is the feature selection step, where the users can choose between two methods of feature selection (information gain or ANOVA) or add another feature selection method (from the available nodes in KNIME, using scripting or external tools). The fourth Meta-node is the modeling step, where we offer a choice of five well-known and relevant classifiers. The methods included in the out-of-the-box basic version of the workflow are described in Table 1. \n\r\n\n\n\n\n\n\n\n\nTable 1. Machine learning methods available in PCM-SABRE\n\n\nMeta-node\n\nMethod\n\nKNIME node\n\nDefault parameters\n\n\n1.1\n\nSelect Patients\n\nEstrogen Receptor (ER) status\n\nR script\n\n\n\n\n1.2\n\nSelect Patients\n\nLymph Node (LN) status\n\nR script\n\n\n\n\n2.1\n\nFeature Selection\n\nInformation Gain (InfoGain)\n\nInformationGainCalculator (Community node \u2013 Palladian)\n\nTop 100 ranked\n\n\n2.2\n\nFeature Selection\n\nANOVA\n\nOne-way ANOVA\n\nInclude genes with p-value\u2009<\u20091.0E-6\n\n\n3.1\n\nModeling\n\nLogistic Regression (LR)\n\nLogistic (3.7) (Weka node)\n\nRidge\u2009=\u20091.0E-8,\n\n\n3.2\n\nModeling\n\nRandom Forest (RF)\n\nRandom Forest Learner\n\nSplit criteria\u2009=\u2009Information Gain Ratio, Number of models\u2009=\u2009350\n\n\n3.3\n\nModeling\n\nArtificial Neural Network (ANN)\n\nPNN Learner (DDA)\n\nTheta Minus\u2009=\u20090.2, Theta Plus\u2009=\u20090.4\n\n\n3.4\n\nModeling\n\nK-Nearest Neighbors (KNN)\n\nIBK (3.7) (Weka node)\n\nKNN\u2009=\u200915\n\n\n3.5\n\nModeling\n\nSupport Vector Machine (SVM)\n\nSVM Learner\n\nKernel\u2009=\u2009RBF, sigma\u2009=\u20090.2\n\n\n\nIt should be noted that thanks to the design of KNIME, adding additional Modeling and Feature Selection methods involves just dropping additional nodes in the appropriate Meta-nodes and connecting them by drag-and-drop using the existing methods as templates. Our experience with experimental biologists suggests that any oncology researcher without programming capabilities can achieve this with little or no special training. Fig. 2 illustrates how the user can easily and quickly add additional classifier to the workflow: (1) double-click modeling\u2009\u2192\u2009new model\u2009\u2192\u2009cross-validation (2) delete the decision tree learner and predictor (3) choose from the Node Repository another learner and predictor nodes, then drag-and-drop them instead of the deleted nodes (4) connect the X-Partitioner node Training data output into the Learner node input, connect the Learner node PMML output into the PMML input of the Predictor node, connect the Predictor node to the X-Aggregator node, and connect the X-partitioner Test data output to the Predictor node. \n\r\n\n\n\n\n\n\n\n\n\n\n Figure 2. Demonstration of drag-and-drop model replacement (Na\u00efve Bayes instead of decision tree)\n\n\n\nThe fifth Meta-node is the evaluation step, which calculates the performance measures of different models (among them the accuracy and the area under the ROC). An important feature of PCM-SABRE is a .csv file (flow_variables.csv) that allows the user to control some default input parameters without the need to change these parameters inside the specific KNIME nodes. The controlled input parameters are: (1) Feature selection method (default\u2009=\u2009infoGain), ER status (default\u2009=\u2009all patients), Lymph node status (default\u2009=\u2009all patients) and the threshold for the binary survival variable (default\u2009=\u2009five years). Changing and adding another input parameter is simple and only requires filling cells in Excel. Additional details on how to use PCM-SABRE can be found in the user manual.\nPCM-SABRE output includes, for each combination of a feature selection method with a classification algorithm, (1) performance measures, (2) ROC analysis and (3) a list of ranked features.\n\nResults \nWe developed PCM-SABRE (available as Additional file 1) as a software system that allows for the comparison and improvement of expression-based predictive models of cancer patients. We used PCM-SABRE to replicate previous work that describes predictive models of breast cancer recurrence, and we evaluated the performance of all possible combinations of feature selection methods and data mining algorithms that was used in either of the works.\n\nUsing PCM-SABRE for replicating a previous work that utilizes machine learning to induce outcome prediction models \nWe first demonstrate the value of PCM-SABRE to investigators implementing new machine learning pipelines for breast cancer recurrence prediction by replicating the work of Chou et al.[10] Our analysis reconstructs the paper to the best of our ability, with the following exceptions:\n\n We use KNIME rather than the original software (Clementine 10.1), and we use as input data a more current compendium of expression data (to be called the Gy\u00f6rffy dataset for the rest of this paper).[7] The dataset is available for download here: http:\/\/kmplot.com\/analysis\/index.php?p=download. \n The Gy\u00f6rffy dataset originally contained 1809 examples (breast cancer patients) and 22,216 features (clinical features and probes expression level). \n A binary class attribute was created indicating whether the cancer recurred within five years or not.\nTo best reproduce the original work, we made the following modifications to the default out-of-the-box KNIME pipeline:\n\n A preprocessing step was added that reproduces the preprocessing performed in the original paper. This step was conducted with a specialized R script written for this purpose. In this step, features were transformed from probe to gene level. After the transformation, the dataset contained 13,725 features.\n In the preprocessing Meta-node, we removed lymph node positive patients and patients with follow-up time of less than five years (1219 patients remained).\n Two new feature selection methods were added to the feature selection Meta-node (Fig. 3):\na. The Mann\u2013Whitney U test was used for decreasing the number of genes from 13,725 to 100 exactly as described by Chou et al.[10] The Mann\u2013Whitney U non-parametric test, which is also known as the Wilcoxon rank sum test, tests for differences between two groups on a single, ordinal variable with no specific distribution.[11] The U statistic of each group is calculated as a difference between the actual sum of ranks of the group observations and the sum expected value under the null hypothesis that the distribution of the ordinal variable in both groups is equal. (See Chou et al. for more details.[10])<\/dd>\nb. A compound selection method was added, in which the results of the DT algorithm were used to determine which features will be retained for PNN and LR analysis.\n   DA (Decision tree\u2009+\u2009Probabilistic neural network) DT\u2009+\u2009PNN\u2009\u2192\u2009DA\n   DL (Decision tree\u2009+\u2009Logistic regression) DT\u2009+\u2009LR\u2009\u2192\u2009DL<\/dd>\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. Screenshot of PCM-SABRE\n\n\n\nThe classification performance results from PCM-SABRE and from the original paper are compared in Table 2. In contrast to the original work, PCM-SABRE reports that LR has the best performance. Moreover, both show a different trend when adding the DT feature selection methods. It is worth noting that the estimated accuracy reported by PCM-SABRE is higher than in the original work. This may be because a different dataset was used for the analysis.\n\r\n\n\n\n\n\n\n\n\nTable 2. Predictive power (in terms of percent accuracy) of several feature selection methods combined with different classification models. AUC results are shown in parentheses.\n\n\nPrediction model\n\nPCM-SABRE pipeline\n\nChou et al.[10] MW U test\n\n\nFeature selection\n\nInfoGain\n\nANOVA\n\nMW U test\n\n\nRF\n\n76.52 (NA)\n\n77.70 (NA)\n\n76.10 (NA)\n\nNA\n\n\nLR\n\n76.27 (73.0)\n\n66.55 (62.49)\n\n75.68 (70.95)\n\n64.12 (58.96)\n\n\nPNN\n\n76.52 (74.09)\n\n76.27 (75.21)\n\n74.58 (72.32)\n\n69.54 (63.88)\n\n\nKNN\n\n75.76 (67.78)\n\n75.34 (68.48)\n\n76.10 (70.30)\n\nNA\n\n\nSVM\n\n72.64 (NA)\n\n72.64 (NA)\n\n72.64 (NA)\n\nNA\n\n\nDT\n\n70.19 (60.59)\n\n68.07 (61.53)\n\n64.44 (57.34)\n\n63.45 (56.90)\n\n\nDL\n\nNA\n\nNA\n\n75.34 (71.71)\n\n68.90 (61.66)\n\n\nDA\n\nNA\n\nNA\n\n75.51 (72.23)\n\n65.91 (61.65)\n\n\n\nUsing PCM-SABRE for optimizing and improving breast cancer outcome prediction \nFor the task of breast cancer outcome prediction, we again used the dataset published by Gy\u00f6rffy et al. and conducted the preprocessing steps maintained above. Table 2 summarizes the performance of all combinations of feature selection methods and classification algorithms. LR, PNN, KNN, and DT performed better combined with the InfoGain feature selection method, in terms of Accuracy but not in terms of AUC. RF performed better combined with the ANOVA feature selection method and achieved the highest Accuracy (77.70%).\n\nDiscussion \nWe developed an intuitive platform for comparing machine learning pipelines for survival prediction. To demonstrate the usefulness of our tool, we first show that with minimal modifications, PCM-SABRE can be used to reconstruct machine learning pipelines from the literature and to explore the impact of changes in the process (such as adding sequential feature selection) on its performance. We reconstructed the work of Chou et al., similarly observing the superior performance of PNN and LR over DT, but the impact of feature pre-selection with the DT algorithm on subsequent algorithm was inconclusive. These results reinforce the need for a platform like PCM-SABRE that would allow more reliable comparison between studies and reproducible results.\nTo further explore the usefulness of PCM-SABRE, we used it to methodologically explore various combinations of feature-selection\/modelling algorithms. As expected, some algorithms perform better than others. However, we find that for the particular task of inducing a predictive model for breast cancer survival, in terms of Accuracy, information gain outperforms ANOVA for feature selection, with four out of six algorithms that were tested, and achieved similar performance in two additional algorithms.\nThese results demonstrate the two main uses we propose for PCM-SABRE. First and foremost, future attempts to improve survival prediction can be reported using PCM-SABRE. This would ensure reproducibility of the analysis, as KNIME allows to bundle the input data with the algorithm. By publishing executable description of the process, the users will be able to run exactly the same pipeline, and even more importantly, the users will be able to understand and evaluate the particular contribution of each step in the process by changing it and observing the impact on model quality.\nThe other use we propose for PCM-SABRE is optimization of predictive models. Using KNIME, it is straightforward to consider the impact of changing each step in the model induction process, and within the PCM-SABRE framework the results are directly comparable. The ability to keep all other steps constant or to evaluate different combinations can allow non-experts to optimize their predictive models while ensuring the resulting process can be intuitively communicated to others.\nNowadays, more and more researchers who study breast cancer recurrence risk prediction specifically, and researchers who study cancer outcome prediction in general, are increasingly using data mining and machine learning methods. In order to take a step forward in this field, the community has to put a greater emphasis on reproducible research. As we already maintained, as of today, it is almost impossible to compare between different \u201cgene signature\u201d papers that are being published. We believe that if researchers will implement their data analysis process on PCM-SABRE and will make their workflow available as an additional file, it will benefit everybody and will cause the prediction models and the gene lists that accompany them to be more reliable. Sharing KNIME workflow is very easy; KNIME allows users to save the workflows with or without the input data file, and simple compression software will allow the researcher to publish the entire KNIME folder as a single file. The researcher can also add a screenshot of KNIME to a paper (maybe instead of the \u201cusual\u201d figure that describes the data analysis process).\nClearly, PCM-SABRE can be implemented with other intuitive pipeline development systems. RapidMiner[12] is a popular machine learning environment that can also be used for this purpose. RapidMiner is very similar to KNIME. Both software tools are visual environments for predictive analytics; both are available for Windows, Mac, and Linux; and both offers online help forums, documentation and tutorials. Although RapidMiner is ranked higher in the list of the top Analytics\/Data Science Tools 2016 according to KDnuggets (5 vs. 9)[13], KNIME has a large customer base in the life sciences sector (bioinformatics and next-generation sequencing extensions can be found here: https:\/\/tech.knime.org\/bioinformatics-and-next-generation-sequencing-extensions). In addition, we believe that KNIME is more intuitive and provides a \"softer landing\" for cancer researchers who are unskilled in programming and who are interested in sharing their data analysis workflow with other researchers. Other tools also exist, such as the WEKA workspace.[14] However, these are not sufficiently intuitive for untrained users. The features of KNIME which we think make it most attractive for this purpose are the ability to wrap critical parts of the process in metanodes, the strong branching and looping capability that supports evaluating alternative methods in parallel, and the ability to pass parameters to the pipeline, as a way to enhance user control without requiring a detailed editing of many nodes. We thus conclude that while PCM-SABRE can be implemented with other machine-learning platforms, KNIME offers a user-friendly yet powerful solution for this purpose.\nThe approach we present here is not unique to survival prediction from expression data: in principle, PCM-SABRE can also be used for developing other predictive models. However, as other projects may emphasize other steps in machine learning (e.g., feature extraction), more work is required to adapt PCM-SABRE for other tasks.\n\nConclusions \nPCM-SABRE is a software tool that provides an intuitive environment for a rapid development of predictive models in cancer precision medicine. It allows to easily define a data source and to consider alternative ways to conduct the main steps of the prediction process. The resulting pipeline can be shared with others in an intuitive yet executable way, which will improve, if adopted by other investigators, the comparability and interpretability of future works attempting to predict patient survival from gene expression data.\n\nAbbreviations \nAUC: Area under a curve\nDA: Decision tree for attribute selection and artificial neural network for classification\nDL: Decision tree for attribute selection and logistic regression for classification\nDT: Decision tree\nER: Estrogen receptor\nInfoGain: Information gain\nKNN: K-nearest neighbors\nLN: Lymph node\nLR: Logistic regression\nPNN: Probabilistic neural networks\nRF: Random forest\nSVM: Support vector machine\n\nDeclarations \nAcknowledgements \nNot applicable.\n\nFunding \nThis research was partially supported by the Paul Ivanier Center for Production Management, Ben-Gurion University of the Negev, and the Israeli Science Foundation (through grant number 1188\/16).\n\nAvailability of data and materials \nProject name: PCM-SABRE\nProject home page: http:\/\/erubin85.wixsite.com\/website\/pcm-sabre\nOperating system: Windows\nProgramming language: R\nThe dataset analyzed during the current study is available in https:\/\/drive.google.com\/file\/d\/0B9pANNl-7eDdX1FpdzU4RTE2QkE\/view?usp=sharing\n\nAuthors\u2019 contributions \nNEA, ER and ML conceived of the study. NEA built the PCM-SABRE platform and performed the data analysis. ML supervised the data mining aspects. NEA and ER drafted the manuscript with help and comments from ML. All authors read and approved the final manuscript.\n\nCompeting interests \nThe authors declare that they have no competing interests.\n\nAdditional files \n Additional file 1 : PCM-SABRE Library and PCM-SABRE KNIME workflow (.rar file, 45850 kb)\nReferences \n\n\n\u2191 Sparano, J.A.; Gray, R.J.; Makower, D.F. et al. (2015). \"Prospective Validation of a 21-Gene Expression Assay in Breast Cancer\". New England Journal of Medicine 373 (21): 2005\u201314. doi:10.1056\/NEJMoa1510764. PMID 26412349.   \n\n\u2191 Senkus, E.; Kyriakides, S.; Ohno, S. et al. (2015). \"Primary breast cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up\". Annals of Oncology 26 (Suppl 5): v8-30. doi:10.1093\/annonc\/mdv298. PMID 26314782.   \n\n\u2191 \"U.S. Breast Cancer Statistics\". Breastcancer.org. 2016. http:\/\/www.breastcancer.org\/symptoms\/understand_bc\/statistics . Retrieved 20 December 2016 .   \n\n\u2191 \"Breast cancer statistics\". World Cancer Research Fund International. 2016. http:\/\/www.wcrf.org\/int\/cancer-facts-figures\/data-specific-cancers\/breast-cancer-statistics . Retrieved 20 December 2016 .   \n\n\u2191 \"Statistics for Metastatic Breast Cancer\". Metastatic Breast Cancer Network. 2016. http:\/\/www.mbcn.org\/statistics-for-metastatic-breast-cancer\/ . Retrieved 20 December 2016 .   \n\n\u2191 Cronin, M.; Sangli, C.; Liu, M.L. et al. (2007). \"Analytical validation of the Oncotype DX genomic diagnostic test for recurrence prognosis and therapeutic response prediction in node-negative, estrogen receptor-positive breast cancer\". Clinical Chemistry 53 (6): 1084-91. doi:10.1373\/clinchem.2006.076497. PMID 17463177.   \n\n\u2191 7.0 7.1 Gy\u00f6rffy, B.; Lanczky, A.; Eklund, A.C. et al. (2010). \"An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients\". Breast Cancer Research and Treatment 123 (3): 725-31. doi:10.1007\/s10549-009-0674-9. PMID 20020197.   \n\n\u2191 Naoi, Y.; Kishi, K.; Tanei, T. et al. (2011). \"Development of 95-gene classifier as a powerful predictor of recurrences in node-negative and ER-positive breast cancer patients\". Breast Cancer Research and Treatment 128 (3): 633-41. doi:10.1007\/s10549-010-1145-z. PMID 20803240.   \n\n\u2191 Berthold, M.R.; Cebron, N.; Dill, F. et al. (2008). \"KNIME: The Konstanz Information Miner\". In Preisach, C.; Burkhardt, H.; Schmidt-Thieme, L.; Decker, R. (PDF). Data Analysis, Machine Learning and Applications. Springer-Verlag Berlin Heidelberg. pp. 319\u2013326. doi:10.1007\/978-3-540-78246-9. ISBN 9783540782469. http:\/\/www.inf.uni-konstanz.de\/bioml2\/publications\/Papers2007\/BCDG+07_knime_gfkl.pdf .   \n\n\u2191 10.0 10.1 10.2 10.3 Chou, H.L.; Yao, C.T.; Su, S.L. et al. (2013). \"Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees\". BMC Bioinformatics 14: 100. doi:10.1186\/1471-2105-14-100. PMC PMC3614553. PMID 23506640. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3614553 .   \n\n\u2191 Mann, H.B.; Whitney, D.R. (1947). \"On a test of whether one of two random variables is stochastically larger than the other\". The Annals of Mathematical Statistics 18 (1): 50\u201360.   \n\n\u2191 \"RapidMiner\". RapidMiner, Inc. https:\/\/rapidminer.com\/ . Retrieved 20 December 2016 .   \n\n\u2191 \"R, Python Duel As Top Analytics, Data Science software \u2013 KDnuggets 2016 Software Poll Results\". KDnuggets. 08 June 2016. http:\/\/www.kdnuggets.com\/2016\/06\/r-python-top-analytics-data-mining-data-science-software.html . Retrieved 20 December 2016 .   \n\n\u2191 \"Weka 3: Data Mining Software in Java\". The University of Waikato. http:\/\/www.cs.waikato.ac.nz\/ml\/weka\/ . Retrieved 20 December 2016 .   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Some grammar were corrected when necessary. Some tables and figures were moved slightly to match up better with their text reference. What were citations #11 and 15 originally (link to the Gy\u00f6rffy dataset and link to KNIME extensions) were removed as citations and turned into inline URLs.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\">https:\/\/www.limswiki.org\/index.php\/Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2017)LIMSwiki journal articles (all)LIMSwiki journal articles on bioinformaticsLIMSwiki journal articles on software\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 27 March 2017, at 21:17.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 1,161 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","ffcad3b9d842250ab55f35eb0cee8237_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_PCM-SABRE_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:PCM-SABRE: A platform for benchmarking and comparing outcome prediction methods in precision cancer medicine<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p><b>Background<\/b>: Numerous publications attempt to predict cancer survival outcome from gene expression data using machine-learning methods. A direct comparison of these works is challenging for the following reasons: (1) inconsistent measures used to evaluate the performance of different models, and (2) incomplete specification of critical stages in the process of knowledge discovery. There is a need for a platform that would allow researchers to replicate previous works and to test the impact of changes in the knowledge discovery process on the accuracy of the induced models.\n<\/p><p><b>Results<\/b>: We developed the PCM-SABRE platform, which supports the entire knowledge discovery process for cancer outcome analysis. PCM-SABRE was developed using <a href=\"https:\/\/www.limswiki.org\/index.php\/KNIME\" title=\"KNIME\" target=\"_blank\" class=\"wiki-link\" data-key=\"18360a3b4b22798e231d7b26365dff87\">KNIME<\/a>. By using PCM-SABRE to reproduce the results of previously published works on breast cancer survival, we define a baseline for evaluating future attempts to predict cancer outcome with machine learning. We used PCM-SABRE to replicate previous work that describes predictive models of breast cancer recurrence, and tested the performance of all possible combinations of feature selection methods and data mining algorithms that was used in either of the works. We reconstructed the work of Chou <i>et al.<\/i> observing similar trends \u2013 superior performance of Probabilistic Neural Network (PNN) and logistic regression (LR) algorithms and inconclusive impact of feature pre-selection with the decision tree algorithm on subsequent analysis.\n<\/p><p><b>Conclusions<\/b>: PCM-SABRE is a software tool that provides an intuitive environment for rapid development of predictive models in cancer precision medicine.\n<\/p><p><b>Keywords<\/b>: Breast cancer, data mining, reproducible research\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Background\">Background<\/span><\/h2>\n<p>Predicting the outcome of cancer from gene expression data is a clinically important, computationally challenging task. For example, early-stage, estrogen-receptor-positive, HER2-negative breast cancer patients that are considered to be at low risk for recurrence can avoid chemotherapy, while patients at high or intermediate risk are treated with aggressive (and harmful) chemotherapy.<sup id=\"rdp-ebb-cite_ref-SparanoProspective15_1-0\" class=\"reference\"><a href=\"#cite_note-SparanoProspective15-1\" rel=\"external_link\">[1]<\/a><\/sup>\n<\/p><p>Efforts to stratify patients by risk of recurrence in other tumor types, and the ability to stratify patients by overall chances of survival are not as advanced. Moreover, the relative success in risk stratification for breast cancer patients has been challenged<sup id=\"rdp-ebb-cite_ref-SenkusPrimary15_2-0\" class=\"reference\"><a href=\"#cite_note-SenkusPrimary15-2\" rel=\"external_link\">[2]<\/a><\/sup>, proposing that it in fact stratifies patients into tumor subtypes, which can be achieved with much simpler tests.\n<\/p><p>As a result, a large number of papers have been published and are still being published where gene expression data is analyzed in order to construct models that predict cancer survival or cancer recurrence. Much of these efforts are concentrated on breast cancer, the second most commonly diagnosed cancer among American women (besides skin cancer).<sup id=\"rdp-ebb-cite_ref-BCOrgUSBreast16_3-0\" class=\"reference\"><a href=\"#cite_note-BCOrgUSBreast16-3\" rel=\"external_link\">[3]<\/a><\/sup> About 1 in 8 U.S. women (about 12 percent) will develop invasive breast cancer over the course of her lifetime, and similar rates are reported worldwide.<sup id=\"rdp-ebb-cite_ref-WCRFBreast16_4-0\" class=\"reference\"><a href=\"#cite_note-WCRFBreast16-4\" rel=\"external_link\">[4]<\/a><\/sup> Breast cancer is an attractive domain for risk stratification as it is estimated that resection is a sufficient treatment for 70 to 80 percent of the patients, while the remaining patients will develop advanced metastatic lesions, which are largely impossible to cure.<sup id=\"rdp-ebb-cite_ref-MBCNStatistics16_5-0\" class=\"reference\"><a href=\"#cite_note-MBCNStatistics16-5\" rel=\"external_link\">[5]<\/a><\/sup> Aggressive chemotherapy will reduce the chance of advanced metastasis for those patients in that situation, though it would be harmful and unnecessary therapy for those who aren't. Thus, great efforts have been invested in stratifying patients\u2019 risk of recurrence.<sup id=\"rdp-ebb-cite_ref-CroninAnalytical07_6-0\" class=\"reference\"><a href=\"#cite_note-CroninAnalytical07-6\" rel=\"external_link\">[6]<\/a><\/sup>\n<\/p><p>Due to the importance of risk stratification in breast cancer, combined with its relatively high abundance, breast cancer is the type of tumor for which expression profiles of newly diagnosed patients are most abundant. Several works have been published that apply machine-learning techniques to this data for predicting cancer survivability.<sup id=\"rdp-ebb-cite_ref-Gy.C3.B6rffyAnOnline10_7-0\" class=\"reference\"><a href=\"#cite_note-Gy.C3.B6rffyAnOnline10-7\" rel=\"external_link\">[7]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-NaoiDevelopment11_8-0\" class=\"reference\"><a href=\"#cite_note-NaoiDevelopment11-8\" rel=\"external_link\">[8]<\/a><\/sup> Unfortunately, we found it quite challenging to directly compare these works for the following reasons:\n<\/p>\n<ol><li> Incomplete specification of critical stages in the process of knowledge discovery, such as feature selection.<\/li>\n<li> Differences in the measures used to evaluate models performance. Some only provide the overall accuracy of the proposed classifier, some offer only the area under a curve (AUC), while others provide no statistical measures and only present the Kaplan-Meier charts that visualize the survival curves based on predicted classes.<\/li>\n<li> Different studies apply different inclusion\/exclusion criteria with little or no overlaps between the patients considered.<\/li><\/ol>\n<p>Incomplete documentation of the analytic process is a common cause for irreproducibility of published results. We conclude that there is a need for a platform that would allow researchers to describe their analytic work in the field of risk stratification for cancer patients in a reproducible way that can be used for further investigation. Such a platform should allow the replication of previous works and methodologically evaluate the impact of alterations in one or more stages of the knowledge discovery process on its performance in the task of cancer survival prediction. Such a tool can help to understand and compare the current state of predictions for breast cancer, and if applied to new cancer types, to prevent the \"Tower of Babel\" situation that has emerged for breast cancer.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Implementation\">Implementation<\/span><\/h2>\n<p>We developed a platform that allows replicating, comparing and improving knowledge discovery pipelines for cancer survival predictions, and demonstrate its applicability for Breast Cancer (Fig. 1). PCM-SABRE (Precision Cancer Medicine - Survival Analysis Benchmarking, Reporting and Evaluation), was developed using <a href=\"https:\/\/www.limswiki.org\/index.php\/KNIME\" title=\"KNIME\" target=\"_blank\" class=\"wiki-link\" data-key=\"18360a3b4b22798e231d7b26365dff87\">KNIME<\/a> (Konstanz Information Miner).<sup id=\"rdp-ebb-cite_ref-BertholdKNIME08_9-0\" class=\"reference\"><a href=\"#cite_note-BertholdKNIME08-9\" rel=\"external_link\">[9]<\/a><\/sup> KNIME is a modern, flexible and intuitive open-source data analytics platform that allows performing sophisticated statistics and data mining analysis to develop, among other things, predictive models. We chose KNIME since it is a popular, user-friendly software that does not require programing knowledge. Its node-based workflow structure allows easily assessing the impact of changing one knowledge discovery step (for example, data mining algorithm) on the predictive performance without changing any other steps of the workflow. Another major benefit of KNIME is the ability to create new nodes; this feature is particularly useful when a researcher is interested in integrating a new method he developed into an existing KNIME workflow.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Eyal-Altman_BMCBioinformatics2017_18.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"1463e0fb1b1eff2c181768cafa439d90\"><img alt=\"Fig1 Eyal-Altman BMCBioinformatics2017 18.gif\" src=\"https:\/\/www.limswiki.org\/images\/b\/b1\/Fig1_Eyal-Altman_BMCBioinformatics2017_18.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> Screenshot of PCM-SABRE<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>We designed PCM-SABRE workflow according to the common steps of knowledge discovery in data. First, the user can use a supplied dataset or load a new dataset. The dataset has to be a .csv file, in the form of a table in which the rows represent the patients and the columns represent clinical data, gene expression data, or any other types of variables. The dependent variable can be binary or continuous (it will be transformed into a binary variable) and needs to represent survival time (for example, relapse-free-survival time or death time). \n<\/p><p>The second Meta-node is the preprocessing step, where a binary dependent variable is created and patients with missing data or censored survival information are being filtered. We chose to use a default threshold of five years in order to split the continues survival variable into HIGH (t\u2009<\u20095 years) or LOW (t\u2009\u2264\u20095 years) risk, but this threshold is an input parameter that can be changed in a way that will be explained later. Missing values imputation is performed using random forest classification that builds a model using the non-missing rows and predicts the variable value for the missing rows. The default version of PCM-SABRE allows selecting patients according to their ER status and lymph node status, but the \"Select Patients\" Meta-node is optional and can be easily modified to meet other inclusion\/exclusion criteria. The third Meta-node is the feature selection step, where the users can choose between two methods of feature selection (information gain or ANOVA) or add another feature selection method (from the available nodes in KNIME, using scripting or external tools). The fourth Meta-node is the modeling step, where we offer a choice of five well-known and relevant classifiers. The methods included in the out-of-the-box basic version of the workflow are described in Table 1. \n<\/p><p><br \/>\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"5\"><b>Table 1.<\/b> Machine learning methods available in PCM-SABRE\n<\/td><\/tr>\n<tr>\n<th style=\"padding-left:10px; padding-right:10px;\" colspan=\"2\">Meta-node\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">Method\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">KNIME node\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">Default parameters\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.1\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Select Patients\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Estrogen Receptor (ER) status\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">R script\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1.2\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Select Patients\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Lymph Node (LN) status\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">R script\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">2.1\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Feature Selection\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Information Gain (InfoGain)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">InformationGainCalculator (Community node \u2013 Palladian)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Top 100 ranked\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">2.2\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Feature Selection\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">ANOVA\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">One-way ANOVA\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Include genes with <i>p<\/i>-value\u2009<\u20091.0E-6\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">3.1\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Modeling\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Logistic Regression (LR)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Logistic (3.7) (Weka node)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Ridge\u2009=\u20091.0E-8,\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">3.2\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Modeling\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Random Forest (RF)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Random Forest Learner\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Split criteria\u2009=\u2009Information Gain Ratio, Number of models\u2009=\u2009350\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">3.3\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Modeling\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Artificial Neural Network (ANN)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">PNN Learner (DDA)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Theta Minus\u2009=\u20090.2, Theta Plus\u2009=\u20090.4\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">3.4\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Modeling\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">K-Nearest Neighbors (KNN)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">IBK (3.7) (Weka node)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">KNN\u2009=\u200915\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">3.5\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Modeling\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Support Vector Machine (SVM)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">SVM Learner\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Kernel\u2009=\u2009RBF, sigma\u2009=\u20090.2\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>It should be noted that thanks to the design of KNIME, adding additional Modeling and Feature Selection methods involves just dropping additional nodes in the appropriate Meta-nodes and connecting them by drag-and-drop using the existing methods as templates. Our experience with experimental biologists suggests that any oncology researcher without programming capabilities can achieve this with little or no special training. Fig. 2 illustrates how the user can easily and quickly add additional classifier to the workflow: (1) double-click modeling\u2009\u2192\u2009new model\u2009\u2192\u2009cross-validation (2) delete the decision tree learner and predictor (3) choose from the Node Repository another learner and predictor nodes, then drag-and-drop them instead of the deleted nodes (4) connect the X-Partitioner node Training data output into the Learner node input, connect the Learner node PMML output into the PMML input of the Predictor node, connect the Predictor node to the X-Aggregator node, and connect the X-partitioner Test data output to the Predictor node. \n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Eyal-Altman_BMCBioinformatics2017_18.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"390a2e774276686887a59fe459d8f523\"><img alt=\"Fig2 Eyal-Altman BMCBioinformatics2017 18.gif\" src=\"https:\/\/www.limswiki.org\/images\/2\/2f\/Fig2_Eyal-Altman_BMCBioinformatics2017_18.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> Demonstration of drag-and-drop model replacement (Na\u00efve Bayes instead of decision tree)<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The fifth Meta-node is the evaluation step, which calculates the performance measures of different models (among them the accuracy and the area under the ROC). An important feature of PCM-SABRE is a .csv file (flow_variables.csv) that allows the user to control some default input parameters without the need to change these parameters inside the specific KNIME nodes. The controlled input parameters are: (1) Feature selection method (default\u2009=\u2009infoGain), ER status (default\u2009=\u2009all patients), Lymph node status (default\u2009=\u2009all patients) and the threshold for the binary survival variable (default\u2009=\u2009five years). Changing and adding another input parameter is simple and only requires filling cells in Excel. Additional details on how to use PCM-SABRE can be found in the user manual.\n<\/p><p>PCM-SABRE output includes, for each combination of a feature selection method with a classification algorithm, (1) performance measures, (2) ROC analysis and (3) a list of ranked features.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Results\">Results<\/span><\/h2>\n<p>We developed PCM-SABRE (available as Additional file 1) as a software system that allows for the comparison and improvement of expression-based predictive models of cancer patients. We used PCM-SABRE to replicate previous work that describes predictive models of breast cancer recurrence, and we evaluated the performance of all possible combinations of feature selection methods and data mining algorithms that was used in either of the works.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Using_PCM-SABRE_for_replicating_a_previous_work_that_utilizes_machine_learning_to_induce_outcome_prediction_models\">Using PCM-SABRE for replicating a previous work that utilizes machine learning to induce outcome prediction models<\/span><\/h3>\n<p>We first demonstrate the value of PCM-SABRE to investigators implementing new machine learning pipelines for breast cancer recurrence prediction by replicating the work of Chou <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-ChouGene13_10-0\" class=\"reference\"><a href=\"#cite_note-ChouGene13-10\" rel=\"external_link\">[10]<\/a><\/sup> Our analysis reconstructs the paper to the best of our ability, with the following exceptions:\n<\/p>\n<ul><li> We use KNIME rather than the original software (Clementine 10.1), and we use as input data a more current compendium of expression data (to be called the Gy\u00f6rffy dataset for the rest of this paper).<sup id=\"rdp-ebb-cite_ref-Gy.C3.B6rffyAnOnline10_7-1\" class=\"reference\"><a href=\"#cite_note-Gy.C3.B6rffyAnOnline10-7\" rel=\"external_link\">[7]<\/a><\/sup> The dataset is available for download here: <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/kmplot.com\/analysis\/index.php?p=download\" target=\"_blank\">http:\/\/kmplot.com\/analysis\/index.php?p=download<\/a>. <\/li>\n<li> The Gy\u00f6rffy dataset originally contained 1809 examples (breast cancer patients) and 22,216 features (clinical features and probes expression level). <\/li>\n<li> A binary class attribute was created indicating whether the cancer recurred within five years or not.<\/li><\/ul>\n<p>To best reproduce the original work, we made the following modifications to the default out-of-the-box KNIME pipeline:\n<\/p>\n<ol><li> A preprocessing step was added that reproduces the preprocessing performed in the original paper. This step was conducted with a specialized <a href=\"https:\/\/www.limswiki.org\/index.php\/R_(programming_language)\" title=\"R (programming language)\" target=\"_blank\" class=\"wiki-link\" data-key=\"1b0aa598f071aca4c5b4ee08d8bb2bde\">R script<\/a> written for this purpose. In this step, features were transformed from probe to gene level. After the transformation, the dataset contained 13,725 features.<\/li>\n<li> In the preprocessing Meta-node, we removed lymph node positive patients and patients with follow-up time of less than five years (1219 patients remained).<\/li>\n<li> Two new feature selection methods were added to the feature selection Meta-node (Fig. 3):<\/li><\/ol>\n<dl><dd><dl><dd>a. The Mann\u2013Whitney <i>U<\/i> test was used for decreasing the number of genes from 13,725 to 100 exactly as described by Chou <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-ChouGene13_10-1\" class=\"reference\"><a href=\"#cite_note-ChouGene13-10\" rel=\"external_link\">[10]<\/a><\/sup> The Mann\u2013Whitney <i>U<\/i> non-parametric test, which is also known as the Wilcoxon rank sum test, tests for differences between two groups on a single, ordinal variable with no specific distribution.<sup id=\"rdp-ebb-cite_ref-MannOnATest47_11-0\" class=\"reference\"><a href=\"#cite_note-MannOnATest47-11\" rel=\"external_link\">[11]<\/a><\/sup> The <i>U<\/i> statistic of each group is calculated as a difference between the actual sum of ranks of the group observations and the sum expected value under the null hypothesis that the distribution of the ordinal variable in both groups is equal. (See Chou <i>et al.<\/i> for more details.<sup id=\"rdp-ebb-cite_ref-ChouGene13_10-2\" class=\"reference\"><a href=\"#cite_note-ChouGene13-10\" rel=\"external_link\">[10]<\/a><\/sup>)<\/dd><\/dl><\/dd><\/dl>\n<dl><dd><dl><dd>b. A compound selection method was added, in which the results of the DT algorithm were used to determine which features will be retained for PNN and LR analysis.<\/dd>\n<dd>   DA (Decision tree\u2009+\u2009Probabilistic neural network) DT\u2009+\u2009PNN\u2009\u2192\u2009DA<\/dd>\n<dd>   DL (Decision tree\u2009+\u2009Logistic regression) DT\u2009+\u2009LR\u2009\u2192\u2009DL<\/dd><\/dl><\/dd><\/dl>\n<p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Eyal-Altman_BMCBioinformatics2017_18.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"ff402c774e9ee0a8c1210ddc5c5dc20c\"><img alt=\"Fig3 Eyal-Altman BMCBioinformatics2017 18.gif\" src=\"https:\/\/www.limswiki.org\/images\/3\/3e\/Fig3_Eyal-Altman_BMCBioinformatics2017_18.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> Screenshot of PCM-SABRE<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The classification performance results from PCM-SABRE and from the original paper are compared in Table 2. In contrast to the original work, PCM-SABRE reports that LR has the best performance. Moreover, both show a different trend when adding the DT feature selection methods. It is worth noting that the estimated accuracy reported by PCM-SABRE is higher than in the original work. This may be because a different dataset was used for the analysis.\n<\/p><p><br \/>\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"5\"><b>Table 2.<\/b> Predictive power (in terms of percent accuracy) of several feature selection methods combined with different classification models. AUC results are shown in parentheses.\n<\/td><\/tr>\n<tr>\n<th style=\"padding-left:10px; padding-right:10px;\">Prediction model\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\" colspan=\"3\">PCM-SABRE pipeline\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\" rowspan=\"2\">Chou <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-ChouGene13_10-3\" class=\"reference\"><a href=\"#cite_note-ChouGene13-10\" rel=\"external_link\">[10]<\/a><\/sup> MW <i>U<\/i> test\n<\/th><\/tr>\n<tr>\n<th style=\"padding-left:10px; padding-right:10px;\">Feature selection\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">InfoGain\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">ANOVA\n<\/th>\n<th style=\"padding-left:10px; padding-right:10px;\">MW <i>U<\/i> test\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">RF\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">76.52 (NA)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">77.70 (NA)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">76.10 (NA)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">NA\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">LR\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">76.27 (73.0)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">66.55 (62.49)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">75.68 (70.95)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">64.12 (58.96)\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">PNN\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">76.52 (74.09)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">76.27 (75.21)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">74.58 (72.32)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">69.54 (63.88)\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">KNN\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">75.76 (67.78)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">75.34 (68.48)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">76.10 (70.30)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">NA\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">SVM\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">72.64 (NA)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">72.64 (NA)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">72.64 (NA)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">NA\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">DT\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">70.19 (60.59)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">68.07 (61.53)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">64.44 (57.34)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">63.45 (56.90)\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">DL\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">NA\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">NA\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">75.34 (71.71)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">68.90 (61.66)\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">DA\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">NA\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">NA\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">75.51 (72.23)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">65.91 (61.65)\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h3><span class=\"mw-headline\" id=\"Using_PCM-SABRE_for_optimizing_and_improving_breast_cancer_outcome_prediction\">Using PCM-SABRE for optimizing and improving breast cancer outcome prediction<\/span><\/h3>\n<p>For the task of breast cancer outcome prediction, we again used the dataset published by Gy\u00f6rffy <i>et al.<\/i> and conducted the preprocessing steps maintained above. Table 2 summarizes the performance of all combinations of feature selection methods and classification algorithms. LR, PNN, KNN, and DT performed better combined with the InfoGain feature selection method, in terms of Accuracy but not in terms of AUC. RF performed better combined with the ANOVA feature selection method and achieved the highest Accuracy (77.70%).\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Discussion\">Discussion<\/span><\/h2>\n<p>We developed an intuitive platform for comparing machine learning pipelines for survival prediction. To demonstrate the usefulness of our tool, we first show that with minimal modifications, PCM-SABRE can be used to reconstruct machine learning pipelines from the literature and to explore the impact of changes in the process (such as adding sequential feature selection) on its performance. We reconstructed the work of Chou <i>et al.<\/i>, similarly observing the superior performance of PNN and LR over DT, but the impact of feature pre-selection with the DT algorithm on subsequent algorithm was inconclusive. These results reinforce the need for a platform like PCM-SABRE that would allow more reliable comparison between studies and reproducible results.\n<\/p><p>To further explore the usefulness of PCM-SABRE, we used it to methodologically explore various combinations of feature-selection\/modelling algorithms. As expected, some algorithms perform better than others. However, we find that for the particular task of inducing a predictive model for breast cancer survival, in terms of Accuracy, information gain outperforms ANOVA for feature selection, with four out of six algorithms that were tested, and achieved similar performance in two additional algorithms.\n<\/p><p>These results demonstrate the two main uses we propose for PCM-SABRE. First and foremost, future attempts to improve survival prediction can be reported using PCM-SABRE. This would ensure reproducibility of the analysis, as KNIME allows to bundle the input data with the algorithm. By publishing executable description of the process, the users will be able to run exactly the same pipeline, and even more importantly, the users will be able to understand and evaluate the particular contribution of each step in the process by changing it and observing the impact on model quality.\n<\/p><p>The other use we propose for PCM-SABRE is optimization of predictive models. Using KNIME, it is straightforward to consider the impact of changing each step in the model induction process, and within the PCM-SABRE framework the results are directly comparable. The ability to keep all other steps constant or to evaluate different combinations can allow non-experts to optimize their predictive models while ensuring the resulting process can be intuitively communicated to others.\n<\/p><p>Nowadays, more and more researchers who study breast cancer recurrence risk prediction specifically, and researchers who study cancer outcome prediction in general, are increasingly using data mining and machine learning methods. In order to take a step forward in this field, the community has to put a greater emphasis on reproducible research. As we already maintained, as of today, it is almost impossible to compare between different \u201cgene signature\u201d papers that are being published. We believe that if researchers will implement their data analysis process on PCM-SABRE and will make their workflow available as an additional file, it will benefit everybody and will cause the prediction models and the gene lists that accompany them to be more reliable. Sharing KNIME workflow is very easy; KNIME allows users to save the workflows with or without the input data file, and simple compression software will allow the researcher to publish the entire KNIME folder as a single file. The researcher can also add a screenshot of KNIME to a paper (maybe instead of the \u201cusual\u201d figure that describes the data analysis process).\n<\/p><p>Clearly, PCM-SABRE can be implemented with other intuitive pipeline development systems. RapidMiner<sup id=\"rdp-ebb-cite_ref-RapidMiner_12-0\" class=\"reference\"><a href=\"#cite_note-RapidMiner-12\" rel=\"external_link\">[12]<\/a><\/sup> is a popular machine learning environment that can also be used for this purpose. RapidMiner is very similar to KNIME. Both software tools are visual environments for predictive analytics; both are available for Windows, Mac, and Linux; and both offers online help forums, documentation and tutorials. Although RapidMiner is ranked higher in the list of the top Analytics\/Data Science Tools 2016 according to KDnuggets (5 vs. 9)<sup id=\"rdp-ebb-cite_ref-KDN_R16_13-0\" class=\"reference\"><a href=\"#cite_note-KDN_R16-13\" rel=\"external_link\">[13]<\/a><\/sup>, KNIME has a large customer base in the life sciences sector (<a href=\"https:\/\/www.limswiki.org\/index.php\/Bioinformatics\" title=\"Bioinformatics\" target=\"_blank\" class=\"wiki-link\" data-key=\"8f506695fdbb26e3f314da308f8c053b\">bioinformatics<\/a> and next-generation sequencing extensions can be found here: <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/tech.knime.org\/bioinformatics-and-next-generation-sequencing-extensions\" target=\"_blank\">https:\/\/tech.knime.org\/bioinformatics-and-next-generation-sequencing-extensions<\/a>). In addition, we believe that KNIME is more intuitive and provides a \"softer landing\" for cancer researchers who are unskilled in programming and who are interested in sharing their data analysis workflow with other researchers. Other tools also exist, such as the WEKA workspace.<sup id=\"rdp-ebb-cite_ref-Weka_14-0\" class=\"reference\"><a href=\"#cite_note-Weka-14\" rel=\"external_link\">[14]<\/a><\/sup> However, these are not sufficiently intuitive for untrained users. The features of KNIME which we think make it most attractive for this purpose are the ability to wrap critical parts of the process in metanodes, the strong branching and looping capability that supports evaluating alternative methods in parallel, and the ability to pass parameters to the pipeline, as a way to enhance user control without requiring a detailed editing of many nodes. We thus conclude that while PCM-SABRE can be implemented with other machine-learning platforms, KNIME offers a user-friendly yet powerful solution for this purpose.\n<\/p><p>The approach we present here is not unique to survival prediction from expression data: in principle, PCM-SABRE can also be used for developing other predictive models. However, as other projects may emphasize other steps in machine learning (e.g., feature extraction), more work is required to adapt PCM-SABRE for other tasks.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusions\">Conclusions<\/span><\/h2>\n<p>PCM-SABRE is a software tool that provides an intuitive environment for a rapid development of predictive models in cancer precision medicine. It allows to easily define a data source and to consider alternative ways to conduct the main steps of the prediction process. The resulting pipeline can be shared with others in an intuitive yet executable way, which will improve, if adopted by other investigators, the comparability and interpretability of future works attempting to predict patient survival from gene expression data.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Abbreviations\">Abbreviations<\/span><\/h2>\n<p><b>AUC<\/b>: Area under a curve\n<\/p><p><b>DA<\/b>: Decision tree for attribute selection and artificial neural network for classification\n<\/p><p><b>DL<\/b>: Decision tree for attribute selection and logistic regression for classification\n<\/p><p><b>DT<\/b>: Decision tree\n<\/p><p><b>ER<\/b>: Estrogen receptor\n<\/p><p><b>InfoGain<\/b>: Information gain\n<\/p><p><b>KNN<\/b>: K-nearest neighbors\n<\/p><p><b>LN<\/b>: Lymph node\n<\/p><p><b>LR<\/b>: Logistic regression\n<\/p><p><b>PNN<\/b>: Probabilistic neural networks\n<\/p><p><b>RF<\/b>: Random forest\n<\/p><p><b>SVM<\/b>: Support vector machine\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Declarations\">Declarations<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h3>\n<p>Not applicable.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h4>\n<p>This research was partially supported by the Paul Ivanier Center for Production Management, Ben-Gurion University of the Negev, and the Israeli Science Foundation (through grant number 1188\/16).\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Availability_of_data_and_materials\">Availability of data and materials<\/span><\/h4>\n<p><b>Project name<\/b>: PCM-SABRE\n<\/p><p><b>Project home page<\/b>: <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/erubin85.wixsite.com\/website\/pcm-sabre\" target=\"_blank\">http:\/\/erubin85.wixsite.com\/website\/pcm-sabre<\/a>\n<\/p><p><b>Operating system<\/b>: Windows\n<\/p><p><b>Programming language<\/b>: R\n<\/p><p>The dataset analyzed during the current study is available in <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/drive.google.com\/file\/d\/0B9pANNl-7eDdX1FpdzU4RTE2QkE\/view?usp=sharing\" target=\"_blank\">https:\/\/drive.google.com\/file\/d\/0B9pANNl-7eDdX1FpdzU4RTE2QkE\/view?usp=sharing<\/a>\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Authors.E2.80.99_contributions\">Authors\u2019 contributions<\/span><\/h4>\n<p>NEA, ER and ML conceived of the study. NEA built the PCM-SABRE platform and performed the data analysis. ML supervised the data mining aspects. NEA and ER drafted the manuscript with help and comments from ML. All authors read and approved the final manuscript.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h4>\n<p>The authors declare that they have no competing interests.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Additional_files\">Additional files<\/span><\/h2>\n<ul><li> <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/static-content.springer.com\/esm\/art%3A10.1186%2Fs12859-016-1435-5\/MediaObjects\/12859_2016_1435_MOESM1_ESM.rar\" target=\"_blank\">Additional file 1 <\/a>: PCM-SABRE Library and PCM-SABRE KNIME workflow (.rar file, 45850 kb)<\/li><\/ul>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-SparanoProspective15-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SparanoProspective15_1-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sparano, J.A.; Gray, R.J.; Makower, D.F. et al. (2015). \"Prospective Validation of a 21-Gene Expression Assay in Breast Cancer\". <i>New England Journal of Medicine<\/i> <b>373<\/b> (21): 2005\u201314. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1056%2FNEJMoa1510764\" target=\"_blank\">10.1056\/NEJMoa1510764<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26412349\" target=\"_blank\">26412349<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Prospective+Validation+of+a+21-Gene+Expression+Assay+in+Breast+Cancer&rft.jtitle=New+England+Journal+of+Medicine&rft.aulast=Sparano%2C+J.A.%3B+Gray%2C+R.J.%3B+Makower%2C+D.F.+et+al.&rft.au=Sparano%2C+J.A.%3B+Gray%2C+R.J.%3B+Makower%2C+D.F.+et+al.&rft.date=2015&rft.volume=373&rft.issue=21&rft.pages=2005%E2%80%9314&rft_id=info:doi\/10.1056%2FNEJMoa1510764&rft_id=info:pmid\/26412349&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SenkusPrimary15-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SenkusPrimary15_2-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Senkus, E.; Kyriakides, S.; Ohno, S. et al. (2015). \"Primary breast cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up\". <i>Annals of Oncology<\/i> <b>26<\/b> (Suppl 5): v8-30. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1093%2Fannonc%2Fmdv298\" target=\"_blank\">10.1093\/annonc\/mdv298<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26314782\" target=\"_blank\">26314782<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Primary+breast+cancer%3A+ESMO+Clinical+Practice+Guidelines+for+diagnosis%2C+treatment+and+follow-up&rft.jtitle=Annals+of+Oncology&rft.aulast=Senkus%2C+E.%3B+Kyriakides%2C+S.%3B+Ohno%2C+S.+et+al.&rft.au=Senkus%2C+E.%3B+Kyriakides%2C+S.%3B+Ohno%2C+S.+et+al.&rft.date=2015&rft.volume=26&rft.issue=Suppl+5&rft.pages=v8-30&rft_id=info:doi\/10.1093%2Fannonc%2Fmdv298&rft_id=info:pmid\/26314782&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BCOrgUSBreast16-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BCOrgUSBreast16_3-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.breastcancer.org\/symptoms\/understand_bc\/statistics\" target=\"_blank\">\"U.S. Breast Cancer Statistics\"<\/a>. Breastcancer.org. 2016<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.breastcancer.org\/symptoms\/understand_bc\/statistics\" target=\"_blank\">http:\/\/www.breastcancer.org\/symptoms\/understand_bc\/statistics<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 20 December 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=U.S.+Breast+Cancer+Statistics&rft.atitle=&rft.date=2016&rft.pub=Breastcancer.org&rft_id=http%3A%2F%2Fwww.breastcancer.org%2Fsymptoms%2Funderstand_bc%2Fstatistics&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WCRFBreast16-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WCRFBreast16_4-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.wcrf.org\/int\/cancer-facts-figures\/data-specific-cancers\/breast-cancer-statistics\" target=\"_blank\">\"Breast cancer statistics\"<\/a>. World Cancer Research Fund International. 2016<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.wcrf.org\/int\/cancer-facts-figures\/data-specific-cancers\/breast-cancer-statistics\" target=\"_blank\">http:\/\/www.wcrf.org\/int\/cancer-facts-figures\/data-specific-cancers\/breast-cancer-statistics<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 20 December 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Breast+cancer+statistics&rft.atitle=&rft.date=2016&rft.pub=World+Cancer+Research+Fund+International&rft_id=http%3A%2F%2Fwww.wcrf.org%2Fint%2Fcancer-facts-figures%2Fdata-specific-cancers%2Fbreast-cancer-statistics&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MBCNStatistics16-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MBCNStatistics16_5-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.mbcn.org\/statistics-for-metastatic-breast-cancer\/\" target=\"_blank\">\"Statistics for Metastatic Breast Cancer\"<\/a>. Metastatic Breast Cancer Network. 2016<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.mbcn.org\/statistics-for-metastatic-breast-cancer\/\" target=\"_blank\">http:\/\/www.mbcn.org\/statistics-for-metastatic-breast-cancer\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 20 December 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Statistics+for+Metastatic+Breast+Cancer&rft.atitle=&rft.date=2016&rft.pub=Metastatic+Breast+Cancer+Network&rft_id=http%3A%2F%2Fwww.mbcn.org%2Fstatistics-for-metastatic-breast-cancer%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-CroninAnalytical07-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-CroninAnalytical07_6-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Cronin, M.; Sangli, C.; Liu, M.L. et al. (2007). \"Analytical validation of the Oncotype DX genomic diagnostic test for recurrence prognosis and therapeutic response prediction in node-negative, estrogen receptor-positive breast cancer\". <i>Clinical Chemistry<\/i> <b>53<\/b> (6): 1084-91. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1373%2Fclinchem.2006.076497\" target=\"_blank\">10.1373\/clinchem.2006.076497<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/17463177\" target=\"_blank\">17463177<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Analytical+validation+of+the+Oncotype+DX+genomic+diagnostic+test+for+recurrence+prognosis+and+therapeutic+response+prediction+in+node-negative%2C+estrogen+receptor-positive+breast+cancer&rft.jtitle=Clinical+Chemistry&rft.aulast=Cronin%2C+M.%3B+Sangli%2C+C.%3B+Liu%2C+M.L.+et+al.&rft.au=Cronin%2C+M.%3B+Sangli%2C+C.%3B+Liu%2C+M.L.+et+al.&rft.date=2007&rft.volume=53&rft.issue=6&rft.pages=1084-91&rft_id=info:doi\/10.1373%2Fclinchem.2006.076497&rft_id=info:pmid\/17463177&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Gy.C3.B6rffyAnOnline10-7\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-Gy.C3.B6rffyAnOnline10_7-0\" rel=\"external_link\">7.0<\/a><\/sup> <sup><a href=\"#cite_ref-Gy.C3.B6rffyAnOnline10_7-1\" rel=\"external_link\">7.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Gy\u00f6rffy, B.; Lanczky, A.; Eklund, A.C. et al. (2010). \"An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients\". <i>Breast Cancer Research and Treatment<\/i> <b>123<\/b> (3): 725-31. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1007%2Fs10549-009-0674-9\" target=\"_blank\">10.1007\/s10549-009-0674-9<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/20020197\" target=\"_blank\">20020197<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+online+survival+analysis+tool+to+rapidly+assess+the+effect+of+22%2C277+genes+on+breast+cancer+prognosis+using+microarray+data+of+1%2C809+patients&rft.jtitle=Breast+Cancer+Research+and+Treatment&rft.aulast=Gy%C3%B6rffy%2C+B.%3B+Lanczky%2C+A.%3B+Eklund%2C+A.C.+et+al.&rft.au=Gy%C3%B6rffy%2C+B.%3B+Lanczky%2C+A.%3B+Eklund%2C+A.C.+et+al.&rft.date=2010&rft.volume=123&rft.issue=3&rft.pages=725-31&rft_id=info:doi\/10.1007%2Fs10549-009-0674-9&rft_id=info:pmid\/20020197&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NaoiDevelopment11-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NaoiDevelopment11_8-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Naoi, Y.; Kishi, K.; Tanei, T. et al. (2011). \"Development of 95-gene classifier as a powerful predictor of recurrences in node-negative and ER-positive breast cancer patients\". <i>Breast Cancer Research and Treatment<\/i> <b>128<\/b> (3): 633-41. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1007%2Fs10549-010-1145-z\" target=\"_blank\">10.1007\/s10549-010-1145-z<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/20803240\" target=\"_blank\">20803240<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Development+of+95-gene+classifier+as+a+powerful+predictor+of+recurrences+in+node-negative+and+ER-positive+breast+cancer+patients&rft.jtitle=Breast+Cancer+Research+and+Treatment&rft.aulast=Naoi%2C+Y.%3B+Kishi%2C+K.%3B+Tanei%2C+T.+et+al.&rft.au=Naoi%2C+Y.%3B+Kishi%2C+K.%3B+Tanei%2C+T.+et+al.&rft.date=2011&rft.volume=128&rft.issue=3&rft.pages=633-41&rft_id=info:doi\/10.1007%2Fs10549-010-1145-z&rft_id=info:pmid\/20803240&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BertholdKNIME08-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BertholdKNIME08_9-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Berthold, M.R.; Cebron, N.; Dill, F. et al. (2008). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.inf.uni-konstanz.de\/bioml2\/publications\/Papers2007\/BCDG+07_knime_gfkl.pdf\" target=\"_blank\">\"KNIME: The Konstanz Information Miner\"<\/a>. In Preisach, C.; Burkhardt, H.; Schmidt-Thieme, L.; Decker, R. (PDF). <i>Data Analysis, Machine Learning and Applications<\/i>. Springer-Verlag Berlin Heidelberg. pp. 319\u2013326. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1007%2F978-3-540-78246-9\" target=\"_blank\">10.1007\/978-3-540-78246-9<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9783540782469<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.inf.uni-konstanz.de\/bioml2\/publications\/Papers2007\/BCDG+07_knime_gfkl.pdf\" target=\"_blank\">http:\/\/www.inf.uni-konstanz.de\/bioml2\/publications\/Papers2007\/BCDG+07_knime_gfkl.pdf<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=KNIME%3A+The+Konstanz+Information+Miner&rft.atitle=Data+Analysis%2C+Machine+Learning+and+Applications&rft.aulast=Berthold%2C+M.R.%3B+Cebron%2C+N.%3B+Dill%2C+F.+et+al.&rft.au=Berthold%2C+M.R.%3B+Cebron%2C+N.%3B+Dill%2C+F.+et+al.&rft.date=2008&rft.pages=pp.%26nbsp%3B319%E2%80%93326&rft.pub=Springer-Verlag+Berlin+Heidelberg&rft_id=info:doi\/10.1007%2F978-3-540-78246-9&rft.isbn=9783540782469&rft_id=http%3A%2F%2Fwww.inf.uni-konstanz.de%2Fbioml2%2Fpublications%2FPapers2007%2FBCDG%2B07_knime_gfkl.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ChouGene13-10\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-ChouGene13_10-0\" rel=\"external_link\">10.0<\/a><\/sup> <sup><a href=\"#cite_ref-ChouGene13_10-1\" rel=\"external_link\">10.1<\/a><\/sup> <sup><a href=\"#cite_ref-ChouGene13_10-2\" rel=\"external_link\">10.2<\/a><\/sup> <sup><a href=\"#cite_ref-ChouGene13_10-3\" rel=\"external_link\">10.3<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Chou, H.L.; Yao, C.T.; Su, S.L. et al. (2013). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3614553\" target=\"_blank\">\"Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees\"<\/a>. <i>BMC Bioinformatics<\/i> <b>14<\/b>: 100. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1186%2F1471-2105-14-100\" target=\"_blank\">10.1186\/1471-2105-14-100<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3614553\/\" target=\"_blank\">PMC3614553<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23506640\" target=\"_blank\">23506640<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3614553\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3614553<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Gene+expression+profiling+of+breast+cancer+survivability+by+pooled+cDNA+microarray+analysis+using+logistic+regression%2C+artificial+neural+networks+and+decision+trees&rft.jtitle=BMC+Bioinformatics&rft.aulast=Chou%2C+H.L.%3B+Yao%2C+C.T.%3B+Su%2C+S.L.+et+al.&rft.au=Chou%2C+H.L.%3B+Yao%2C+C.T.%3B+Su%2C+S.L.+et+al.&rft.date=2013&rft.volume=14&rft.pages=100&rft_id=info:doi\/10.1186%2F1471-2105-14-100&rft_id=info:pmc\/PMC3614553&rft_id=info:pmid\/23506640&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3614553&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MannOnATest47-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MannOnATest47_11-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Mann, H.B.; Whitney, D.R. (1947). \"On a test of whether one of two random variables is stochastically larger than the other\". <i>The Annals of Mathematical Statistics<\/i> <b>18<\/b> (1): 50\u201360.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+a+test+of+whether+one+of+two+random+variables+is+stochastically+larger+than+the+other&rft.jtitle=The+Annals+of+Mathematical+Statistics&rft.aulast=Mann%2C+H.B.%3B+Whitney%2C+D.R.&rft.au=Mann%2C+H.B.%3B+Whitney%2C+D.R.&rft.date=1947&rft.volume=18&rft.issue=1&rft.pages=50%E2%80%9360&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RapidMiner-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RapidMiner_12-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/rapidminer.com\/\" target=\"_blank\">\"RapidMiner\"<\/a>. RapidMiner, Inc<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/rapidminer.com\/\" target=\"_blank\">https:\/\/rapidminer.com\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 20 December 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=RapidMiner&rft.atitle=&rft.pub=RapidMiner%2C+Inc&rft_id=https%3A%2F%2Frapidminer.com%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KDN_R16-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KDN_R16_13-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.kdnuggets.com\/2016\/06\/r-python-top-analytics-data-mining-data-science-software.html\" target=\"_blank\">\"R, Python Duel As Top Analytics, Data Science software \u2013 KDnuggets 2016 Software Poll Results\"<\/a>. <i>KDnuggets<\/i>. 08 June 2016<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.kdnuggets.com\/2016\/06\/r-python-top-analytics-data-mining-data-science-software.html\" target=\"_blank\">http:\/\/www.kdnuggets.com\/2016\/06\/r-python-top-analytics-data-mining-data-science-software.html<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 20 December 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=R%2C+Python+Duel+As+Top+Analytics%2C+Data+Science+software+%E2%80%93+KDnuggets+2016+Software+Poll+Results&rft.atitle=KDnuggets&rft.date=08+June+2016&rft_id=http%3A%2F%2Fwww.kdnuggets.com%2F2016%2F06%2Fr-python-top-analytics-data-mining-data-science-software.html&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Weka-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Weka_14-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.cs.waikato.ac.nz\/ml\/weka\/\" target=\"_blank\">\"Weka 3: Data Mining Software in Java\"<\/a>. The University of Waikato<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.cs.waikato.ac.nz\/ml\/weka\/\" target=\"_blank\">http:\/\/www.cs.waikato.ac.nz\/ml\/weka\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 20 December 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Weka+3%3A+Data+Mining+Software+in+Java&rft.atitle=&rft.pub=The+University+of+Waikato&rft_id=http%3A%2F%2Fwww.cs.waikato.ac.nz%2Fml%2Fweka%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Some grammar were corrected when necessary. Some tables and figures were moved slightly to match up better with their text reference. What were citations #11 and 15 originally (link to the Gy\u00f6rffy dataset and link to KNIME extensions) were removed as citations and turned into inline URLs.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20181214191059\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.408 seconds\nReal time usage: 0.442 seconds\nPreprocessor visited node count: 11605\/1000000\nPreprocessor generated node count: 33800\/1000000\nPost\u2010expand include size: 85721\/2097152 bytes\nTemplate argument size: 29324\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 405.026 1 - -total\n 76.80% 311.064 1 - Template:Reflist\n 63.73% 258.130 14 - Template:Citation\/core\n 42.18% 170.824 7 - Template:Cite_journal\n 21.76% 88.135 6 - Template:Cite_web\n 17.07% 69.120 1 - Template:Infobox_journal_article\n 16.40% 66.406 1 - Template:Infobox\n 9.65% 39.065 80 - Template:Infobox\/row\n 6.88% 27.848 15 - Template:Citation\/identifier\n 6.04% 24.456 1 - Template:Cite_book\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:9976-0!*!0!!en!5!* and timestamp 20181214191059 and revision id 29620\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine\">https:\/\/www.limswiki.org\/index.php\/Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","ffcad3b9d842250ab55f35eb0cee8237_images":["https:\/\/www.limswiki.org\/images\/b\/b1\/Fig1_Eyal-Altman_BMCBioinformatics2017_18.gif","https:\/\/www.limswiki.org\/images\/2\/2f\/Fig2_Eyal-Altman_BMCBioinformatics2017_18.gif","https:\/\/www.limswiki.org\/images\/3\/3e\/Fig3_Eyal-Altman_BMCBioinformatics2017_18.gif"],"ffcad3b9d842250ab55f35eb0cee8237_timestamp":1544814659,"b1b2d2922d12d6afbd23ca5f216a0cd7_type":"article","b1b2d2922d12d6afbd23ca5f216a0cd7_title":"Ten simple rules for cultivating open science and collaborative R&D (Masum et al. 2013)","b1b2d2922d12d6afbd23ca5f216a0cd7_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D","b1b2d2922d12d6afbd23ca5f216a0cd7_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Ten simple rules for cultivating open science and collaborative R&D\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nTen simple rules for cultivating open science and collaborative R&DJournal\n \nPLOS Computational BiologyAuthor(s)\n \nMasum, Hassan; Rao, Aarthi; Good, Benjamin M.; Todd, Matthew H.; Edwards, Aled M.; Chan, Leslie; Bunin, Barry A.; Su, Andrew I.; Thomas, Zakir; Bourne, Philip E.Author affiliation(s)\n \nWaterloo Institute for Complexity and Innovation, Results for Development Institute, Scripps Research Institute, University of Sydney, University of Toronto, Collaborative Drug Discovery, Scripps Research Institute, Council of Scientific and Industrial Research, University of California San DiegoPrimary contact\n \nEmail: hassan dot masum at utoronto dot caYear published\n \n2013Volume and issue\n \n9(9)Page(s)\n \ne1003244DOI\n \n10.1371\/journal.pcbi.1003244ISSN\n \n1553-7358Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttp:\/\/journals.plos.org\/ploscompbiol\/article?id=10.1371\/journal.pcbi.1003244Download\n \nhttp:\/\/journals.plos.org\/ploscompbiol\/article\/file?id=10.1371\/journal.pcbi.1003244&type=printable (PDF)\n\nContents\n\n1 Introduction \n2 Rule 1: Get the incentives right - Learn from the past \n3 Rule 2: Make your controlled collaborations win-win-win \n4 Rule 3: Understand what works \u2014 And what doesn't \n5 Rule 4: Lead as a coach, not a CEO \n6 Rule 5: Diversify your contributors \n7 Rule 6: Diversify your customers \n8 Rule 7: Don't reinvent the wheel \n9 Rule 8: Think big \n10 Rule 9: Encourage supportive policies and tools \n11 Rule 10: Grow the commons \n12 Supporting information \n13 Acknowledgements \n\n13.1 Funding \n\n\n14 Competing interests \n15 References \n16 Notes \n\n\n\nIntroduction \nHow can we address the complexity and cost of applying science to societal challenges?\nOpen science and collaborative R&D may help.[1][2][3] Open science has been described as \"a research accelerator.\"[4] Open science implies open access[5] but goes beyond it: \"Imagine a connected online web of scientific knowledge that integrates and connects data, computer code, chains of scientific reasoning, descriptions of open problems, and beyond ... tightly integrated with a scientific social web that directs scientists' attention where it is most valuable, releasing enormous collaborative potential.\"[1]\nOpen science and collaborative approaches are often described as open-source, by analogy with open-source software such as the operating system Linux which powers Google and Amazon \u2014 collaboratively created software which is free to use and adapt, and popular for internet infrastructure and scientific research.[6][7] However, this use of \"open source\" is unclear. Some people use \"open source\" when a project's results are free to use, others when a project's process is highly collaborative.[4]\nIt is clearer to classify open source and open science within a broader class of collaborative R&D, which can be defined as scalable collaboration (usually enabled by information technology) across organizational boundaries to solve R&D challenges.[8]\nMany approaches to open science and collaborative R&D have been tried.[1][9] The Gene Wiki has created over 10,000 Wikipedia articles, and aims to provide one for every notable human gene.[10] The crowdsourcing platform InnoCentive has reportedly facilitated solutions to roughly half of the thousands of technical problems posed on the site, including many in life sciences such as the $1 million ALS Biomarker Prize.[11] Other examples include prizes (X-Prize[12]), scientific games (FoldIt[13]), and licensing schemes inspired by open-source software (BIOS[14]).\nCollaborative R&D approaches vary in openness.[15] In some approaches, the R&D process and outputs are open to all \u2014 for example, open-science projects like the Gene Wiki described above. In other approaches which demonstrate what might be called controlled collaboration, there are strong controls on who contributes and benefits \u2014 for example, computational platforms like Collaborative Drug Discovery or InnoCentive that support both commercial and nonprofit research.[9][11]\nCollaborative approaches can unleash innovation from unforeseen sources, as with crowdsourcing health technologies.[11][12][13][16] They may help in global challenges like drug development[17], as with India's OSDD (Open Source Drug Discovery) project that recruited over 7,000 volunteers[16] and an open-source drug synthesis project that improved an existing drug without increasing its cost.[18]\nIf you want to apply open science and collaborative R&D, what principles are useful? We suggest 10 simple rules for cultivating open science and collaborative R&D. We also offer eight conversational interviews exploring life experiences that led to these rules (see Box 1 at end).\n\nRule 1: Get the incentives right - Learn from the past \nWhy should contributors take part in your project? Learn from incentives that have worked in mass collaborations and open-source software, such as reputation building, enjoyment, cooperatively solving interesting problems that are too hard to do alone, and jointly developing tools that benefit all developers.[6][7][19] Organizational incentives can include lowering costs, tapping external innovation, implementing novel business models such as selling complementary services, and jointly competing for public admiration or grant funding. Altruism can motivate collaboration, but frequently it is not the main reason.[9] With this in mind, align individual incentives with collective benefit.[1] Look to past and present precompetitive collaborations for ways to address intellectual property and competitive concerns.[3] Share attribution with contributors so they can advance their goals and demonstrate their capabilities.\n\nRule 2: Make your controlled collaborations win-win-win \nPerhaps completely open science seems unsuitable to you, if for example you are engaged in market-driven R&D that must recoup investments. There are ways to benefit from open science and collaborative methods while retaining appropriate controls and the opportunity to provide public benefit. You, your partners, and the public can all benefit \u2014 a win-win-win situation. You might use computational platforms to supercharge information sharing with selected partners, including public-benefit initiatives that match your mission.[9] You might use crowdsourcing to overcome roadblocks by opening up chosen parts of your R&D process to new innovators.[11] Or you might make public selected data or software tools, exporting them to the open-source realm to gain from goodwill or quality improvement.[3] Sharing can make both business and social sense, whether in implementing open standards, collaborating precompetitively, or reducing duplication of effort.[20] Keep an eye open for opportunities to \"do well by doing good\" by structuring initiatives for private and public benefit.[21] Collaborative approaches can benefit both public and private sectors in collaborating across competitive boundaries, connecting problems with problem solvers, and cultivating a knowledge commons.[1][9]\n\nRule 3: Understand what works \u2014 And what doesn't \nYou can save yourself frustration by not using an unsuitable collaborative method, be it a wiki without an audience or a crowdsourced research challenge without focus.[8] Consider questions like: have you learned from others who have tried the method? Do you understand when the method fails, and what is necessary for it to work? Is there a good match between the method and your goals? Are you contributing your experiences and interesting failures back to the community, thus demonstrating thought leadership? If you are interested in more effective knowledge sharing, consider low-budget opportunities such as starting an online Q&A site about open science or collaborative R&D using a platform like StackExchange. There are also opportunities to help evaluate what really works\u2014moving beyond anecdotal evidence to case studies and metrics.\n\nRule 4: Lead as a coach, not a CEO \nThe command-and-control style doesn't work well with contributors from diverse organizations, many of whom may be volunteers.[22] And as has been said of Linus Torvalds, the founder of the open-source operating system Linux, \"Linus doesn't scale\": leaders of mass collaborations can become bottlenecks unless they encourage distributed workflows and leadership.[7] Be flexible about management (but strict about quality). Check your ego at the door \u2014 you're playing a team game and will be stronger when others want to contribute. Participants will feel more motivated if their contribution enriches a joint resource rather than just the leader. Can you give up exclusive ownership and credit to achieve with others what you cannot achieve alone?\n\nRule 5: Diversify your contributors \nA powerful aspect of collaborative R&D is the potential diversity of the community \u2014 including students[16], patients[23], gamers[10], and researchers from lesser-known countries or institutions. You can use open science to attract diverse contributors by lowering barriers to participation, publicly tackling audacious challenges (see Rule 8), and making collaboration fun. Consider open licensing terms and joint or public ownership of selected outcomes to broaden your participant base.[14][15][21][24] Encourage all community members to find ways to contribute that suit their abilities and inclinations. Can you reach past your usual partners, and make it easy for others to get up to speed with what you're doing? Are there opportunities for \"citizen science,\" perhaps through organizing many microcontributions?[1][10]\n\nRule 6: Diversify your customers \nCan you engage the broadest possible base as beneficiaries? The science that you do in the open spreads its benefits widely, and that can attract unexpected accolades and collaborators.[1][4] Productively involving stakeholders can inform your research \u2014 for example, through participatory research strategies involving the people your efforts are meant to help.[25] Contributing to collaborative initiatives targeting human development challenges can motivate your team, and potentially lead to innovations that are transferable to for-profit markets. Neglected disease R&D is a case in point, which seems particularly suitable for collaborative pilot projects, given its lower profits, humanitarian appeal, and need for new methods.[26] If your work is commercially driven, consider humanitarian licensing approaches that encourage nonprofit applications by others to poorer demographics.[2][21]\n\nRule 7: Don't reinvent the wheel \nThe more you can use what already exists, the greater your effectiveness will be. Are there lab and computational resources that could be used when otherwise idle? Can you find people already working on elements of your problem, and organize their collective work? Before starting a new initiative, have you explored and considered joining existing ones? Piggybacking on active efforts eases prototyping and gathering enthusiastic initial users. Build on the cumulative stockpile of past open initiatives (see Rules 1 and 3).\n\nRule 8: Think big \nFor projects hoping to harness the power of mass collaboration, a major challenge can be attracting a large community of contributors. Many of the best mass collaborations orient around seemingly audacious goals like: \"build a free encyclopedia of all the world's knowledge\" (Wikipedia), \"develop a review article for every human gene\" (Gene Wiki), and \"build a new operating system\" (Linux). Establishing a driving, high-level purpose will help spread the idea of your project and motivate people to come have a look and see what they can do. Be ready to scale with success.\n\nRule 9: Encourage supportive policies and tools \nCan you cultivate open science and collaborative R&D by helping to make them part of \"standard operating procedure\"? For example, can you encourage institutional data sharing?[24] Can you build a profiling platform of collaborative initiatives, summarizing what they have achieved and what types of collaborators they are seeking? Do you have opportunities to adopt appropriate policies in your own organization or field? A case study to learn from is the spread of open access from wishful thinking to widespread fact.[5]\n\nRule 10: Grow the commons \nAs intellectual property debates illustrate, there are legitimate differences of opinion on how best to motivate innovators' investments to generate new knowledge.[21][26] But in the long run, sharing more knowledge and tools boosts both for-profit and nonprofit research.[2][3] This growing shared resource of knowledge and tools \u2014 \"the commons\" \u2014 is the product of centuries of striving. It depends on cumulative win-win-win collaborations spanning organizations, nations, and generations. Can you find ways to advance your interests while remaining part of this larger narrative?[1][5][19][27]\n\nSupporting information \n\n\n\n\n\n\nBox 1. Conversations on Open Science and Collaborative R&D\n\n\nMany commentators have considered challenges in translating open science and collaborative methods to biomedical research.[2][3][4][9][17][20][24][26][28][29] How can protecting intellectual property be balanced with freeing researchers to build on previous knowledge? If R&D results are collaboratively created and freely available, who will take responsibility for costly clinical trials and quality control? What will be the Linux of open-source R&D?\nTo explore such challenges and convey life experiences in biomedical open science and collaborative R&D, we offer eight conversational interviews by the first author of this article as supplementary material. The conversations were done on behalf of the Results for Development Institute and are with:\n\n Alph Bingham, cofounder of InnoCentive; doi:10.1371\/journal.pcbi.1003244.s001 (Text S1) (PDF)\n Barry Bunin, CEO of Collaborative Drug Discovery; doi:10.1371\/journal.pcbi.1003244.s002 (Text S2) (PDF)\n Leslie Chan, open access pioneer and director of Bioline International; doi:10.1371\/journal.pcbi.1003244.s003 (Text S3) (PDF)\n Aled Edwards, director of the Structural Genomics Consortium; doi:10.1371\/journal.pcbi.1003244.s004 (Text S4) (PDF)\n Benjamin Good, coleader of the Gene Wiki initiative; doi:10.1371\/journal.pcbi.1003244.s005 (Text S5) (PDF)\n Bernard Munos, pharmaceutical innovation thought leader; doi:10.1371\/journal.pcbi.1003244.s006 (Text S6) (PDF)\n Zakir Thomas, director of India's Open Source Drug Discovery (OSDD) project; doi:10.1371\/journal.pcbi.1003244.s007 (Text S7) (PDF)\n Matt Todd, open science and drug development pioneer; doi:10.1371\/journal.pcbi.1003244.s008 (Text S8) (PDF)\n\n\n\nAcknowledgements \nWe thank Jean Arkedis, Robert Hecht, and Paul Wilson for comments on early versions of this article. Our thanks also go to all the colleagues and pioneers who have shared their wisdom on making collaborative R&D work.\n\nFunding \nThis article was made possible by support to HM and AR from a grant by the Bill & Melinda Gates Foundation to the Results for Development Institute. The funders had no role in the preparation of the manuscript.\n\nCompeting interests \nThe authors have declared that no competing interests exist.\n\nReferences \n\n\n\u2191 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Nielsen, M. (2011). Reinventing Discovery: The New Era of Networked Science. Princeton University Press. pp. 272. ISBN 9780691148908.   \n\n\u2191 2.0 2.1 2.2 2.3 National Research Council (2011). Uhlir, P.F.. ed. Designing the Microbial Research Commons: Proceedings of an International Symposium. The National Academies Press. pp. 216. ISBN 9780309219792. https:\/\/www.nap.edu\/catalog\/13245\/designing-the-microbial-research-commons-proceedings-of-an-international-symposium .   \n\n\u2191 3.0 3.1 3.2 3.3 3.4 Institute of Medicine; Olson, S.; Berger, A.C. (2011). Establishing Precompetitive Collaborations to Stimulate Genomics-Driven Product Development: Workshop Summary. The National Academies Press. pp. 74. ISBN 9780309161824. https:\/\/www.nap.edu\/catalog\/13015\/establishing-precompetitive-collaborations-to-stimulate-genomics-driven-product-development-workshop .   \n\n\u2191 4.0 4.1 4.2 4.3 Woelfle, M.; Olliaro, P.; Todd, M.H. (2011). \"Open science is a research accelerator\". Nature Chemistry 3 (10): 745-8. doi:10.1038\/nchem.1149. PMID 21941234.   \n\n\u2191 5.0 5.1 5.2 \"PLOS Collections: Open Access Collection\". Public Library of Science. 2013. Archived from the original on 20 April 2013. http:\/\/web.archive.org\/web\/20130420203146\/http:\/\/www.ploscollections.org\/article\/browseIssue.action?issue=info:doi\/10.1371\/issue.pcol.v01.i10 . Retrieved 25 April 2013 .   \n\n\u2191 6.0 6.1 Prli\u0107, A.; Procter, J.B. (2012). \"Ten simple rules for the open development of scientific software\". PLOS Computational Biology 8 (12): e1002802. doi:10.1371\/journal.pcbi.1002802. PMC PMC3516539. PMID 23236269. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3516539 .   \n\n\u2191 7.0 7.1 7.2 Fogel, K. (2013). \"Producing Open Source Software: How to Run a Successful Free Software Project\". http:\/\/producingoss.com\/en\/ . Retrieved 25 April 2013 .   \n\n\u2191 8.0 8.1 \"Collaborative Health R&D Primer\". Global Health R&D Policy Assessment Center. Results for Development Institute. 2013. Archived from the original on 15 January 2013. http:\/\/web.archive.org\/web\/20130115194036\/http:\/\/healthresearchpolicy.org\/primer . Retrieved 25 April 2013 .   \n\n\u2191 9.0 9.1 9.2 9.3 9.4 9.5 Ekins, S.; Hupcey, M.A.Z.; Williams, A.J., ed. (2011). Collaborative Computational Technologies for Biomedical Research. John Wiley & Sons, Inc. pp. 576. ISBN 9780470638033.   \n\n\u2191 10.0 10.1 10.2 Good, B.M.; Clarke, E.L.; de Alfaro, L.; Su, A.I. (2012). \"The Gene Wiki in 2011: Community intelligence applied to human gene annotation\". Nucleic Acids Research 40 (D1): D1255-61. doi:10.1093\/nar\/gkr925. PMC PMC3245148. PMID 22075991. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3245148 .   \n\n\u2191 11.0 11.1 11.2 11.3 Bingham, A.; Spradlin, D. (2011). The Open Innovation Marketplace: Creating Value in the Challenge Driven Enterprise. FT Press. pp. 272. ISBN 9780132311830.   \n\n\u2191 12.0 12.1 Wilson, P.; Palriwala, A. (2011). \"Prizes for Global Health Technologies\". Global Health R&D Policy Assessment Center. Results for Development Institute. Archived from the original on 07 November 2012. http:\/\/web.archive.org\/web\/20121107025448\/http:\/\/healthresearchpolicy.org\/assessments\/prizes-global-health-technologies . Retrieved 25 April 2013 .   \n\n\u2191 13.0 13.1 Good, B.M.; Su, A.I. (2011). \"Games with a scientific purpose\". Genome Biology 12 (12): 135. doi:10.1186\/gb-2011-12-12-135. PMID 22204700.   \n\n\u2191 14.0 14.1 Jefferson, R. (2006). \"Science as social enterprise: The CAMBIA BiOS Initiative\". Innovations: Technology, Governance, Globalization 1 (4): 13\u201344. doi:10.1162\/itgg.2006.1.4.13.   \n\n\u2191 15.0 15.1 \"HowOpenIsIt?\". Public Library of Science. 2013. Archived from the original on 01 March 2013. http:\/\/web.archive.org\/web\/20130301193758\/http:\/\/www.plos.org\/about\/open-access\/howopenisit\/ . Retrieved 25 April 2013 .   \n\n\u2191 16.0 16.1 16.2 Vashisht, R.; Mondal, A.K.; Jain, A. et al. (2012). \"Crowd sourcing a new paradigm for interactome driven drug target identification in Mycobacterium tuberculosis\". PLOS One 7 (7): e39808. doi:10.1371\/journal.pone.0039808. PMC PMC3395720. PMID 22808064. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3395720 .   \n\n\u2191 17.0 17.1 Munos, B.H.; Chin, W.W. (2011). \"How to revive breakthrough innovation in the pharmaceutical industry\". Science Translational Medicine 3 (89): 89cm16. doi:10.1126\/scitranslmed.3002273. PMID 21715677.   \n\n\u2191 Woelfle, M.; Seerden, J.P.; de Gooijer, J. et al. (2011). \"Resolution of praziquantel\". PLOS Neglected Tropical Diseases 5 (9): e1260. doi:10.1371\/journal.pntd.0001260. PMC PMC3176743. PMID 21949890. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3176743 .   \n\n\u2191 19.0 19.1 Benkler, Y. (2011). The Penguin and the Leviathan: How Cooperation Triumphs over Self-Interest. Crown Business. pp. 272. ISBN 9780385525763.   \n\n\u2191 20.0 20.1 Norman, T.C.; Bountra, C.; Edwards, A.M. etc. (2011). \"Leveraging crowdsourcing to facilitate the discovery of new medicines\". Science Translational Medicine 3 (88): 88mr1. doi:10.1126\/scitranslmed.3002678. PMID 21697527.   \n\n\u2191 21.0 21.1 21.2 21.3 Krattiger, A.; Mahoney, R.T.; Nelsen, L. et al., ed. (2007). Intellectual Property Management in Health and Agricultural Innovation: A Handbook of Best Practices. 1. MIHR-USA. ISBN 9781424320264.   \n\n\u2191 Vicens, Q.; Bourne, P.E. (2007). \"Ten simple rules for a successful collaboration\". PLOS Computational Biology 3 (3): e44. doi:10.1371\/journal.pcbi.0030044. PMC PMC1847992. PMID 17397252. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1847992 .   \n\n\u2191 Wicks, P.; Vaughan, T.E.; Massagli, M.P.; Heywood, J. (2011). \"Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm\". Nature Biotechnology 29 (5): 411-4. doi:10.1038\/nbt.1837. PMID 21516084.   \n\n\u2191 24.0 24.1 24.2 Dyke, S.O.; Hubbard, T.J. (2011). \"Developing and implementing an institute-wide data sharing policy\". Genome Medicine 3 (9): 60. doi:10.1186\/gm276. PMC PMC3239235. PMID 21955348. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3239235 .   \n\n\u2191 Holland, J.; Chambers, R. (2013). Who Counts?: The Power of Participatory Statistics. Practical Action. pp. 220. ISBN 9781853397721.   \n\n\u2191 26.0 26.1 26.2 Masum, H.; Harris, R. (2011). \"Open Source for Neglected Diseases: Magic Bullet or Mirage?\". Global Health R&D Policy Assessment Center. Results for Development Institute. Archived from the original on 06 January 2013. http:\/\/web.archive.org\/web\/20130106064334\/http:\/\/healthresearchpolicy.org\/assessments\/open-source-neglected-diseases-magic-bullet-or-mirage . Retrieved 25 April 2013 .   \n\n\u2191 Masum, H.; Tovey, M. (2006). \"Given enough minds...: Bridging the ingenuity gap\". First Monday 11 (7). doi:10.5210\/fm.v11i7.1370.   \n\n\u2191 Marden, E. (2010). \"Open source drug development: A path to more accessible drugs and diagnostics?\". Minnesota Journal of Law, Science & Technology 11 (1): 217\u2013266. http:\/\/hdl.handle.net\/11299\/155748 .   \n\n\u2191 \u00c5rdal, C.; R\u00f8ttingen, J.A. (2012). \"Open source drug discovery in practice: A case study\". PLOS Neglected Tropical Diseases 6 (9): e1827. doi:10.1371\/journal.pntd.0001827. PMC PMC3447952. PMID 23029588. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3447952 .   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. In a few cases, the URLs from 2013 were dead; they were updated with current URLs, and, when applicable, archived URLs from the Internet Archive. Box 1, which in the original appeared at top, has been combined with the supporting information at the bottom.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\">https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2017)LIMSwiki journal articles (all)LIMSwiki journal articles on informaticsLIMSwiki journal articles on research\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 22 March 2017, at 19:30.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 946 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","b1b2d2922d12d6afbd23ca5f216a0cd7_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Ten_simple_rules_for_cultivating_open_science_and_collaborative_R_D skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Ten simple rules for cultivating open science and collaborative R&D<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>How can we address the complexity and cost of applying science to societal challenges?\n<\/p><p>Open science and collaborative R&D may help.<sup id=\"rdp-ebb-cite_ref-NielsenReinventing11_1-0\" class=\"reference\"><a href=\"#cite_note-NielsenReinventing11-1\" rel=\"external_link\">[1]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-NRCDesigning11_2-0\" class=\"reference\"><a href=\"#cite_note-NRCDesigning11-2\" rel=\"external_link\">[2]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-IoMEstablishing11_3-0\" class=\"reference\"><a href=\"#cite_note-IoMEstablishing11-3\" rel=\"external_link\">[3]<\/a><\/sup> Open science has been described as \"a research accelerator.\"<sup id=\"rdp-ebb-cite_ref-WoelfleOpen11_4-0\" class=\"reference\"><a href=\"#cite_note-WoelfleOpen11-4\" rel=\"external_link\">[4]<\/a><\/sup> Open science implies open access<sup id=\"rdp-ebb-cite_ref-PLOSOpenAccess_5-0\" class=\"reference\"><a href=\"#cite_note-PLOSOpenAccess-5\" rel=\"external_link\">[5]<\/a><\/sup> but goes beyond it: \"Imagine a connected online web of scientific knowledge that integrates and connects data, computer code, chains of scientific reasoning, descriptions of open problems, and beyond ... tightly integrated with a scientific social web that directs scientists' attention where it is most valuable, releasing enormous collaborative potential.\"<sup id=\"rdp-ebb-cite_ref-NielsenReinventing11_1-1\" class=\"reference\"><a href=\"#cite_note-NielsenReinventing11-1\" rel=\"external_link\">[1]<\/a><\/sup>\n<\/p><p>Open science and collaborative approaches are often described as open-source, by analogy with open-source software such as the operating system Linux which powers Google and Amazon \u2014 collaboratively created software which is free to use and adapt, and popular for internet infrastructure and scientific research.<sup id=\"rdp-ebb-cite_ref-Prli.C4.87Ten12_6-0\" class=\"reference\"><a href=\"#cite_note-Prli.C4.87Ten12-6\" rel=\"external_link\">[6]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-FogelProducing13_7-0\" class=\"reference\"><a href=\"#cite_note-FogelProducing13-7\" rel=\"external_link\">[7]<\/a><\/sup> However, this use of \"open source\" is unclear. Some people use \"open source\" when a project's results are free to use, others when a project's process is highly collaborative.<sup id=\"rdp-ebb-cite_ref-WoelfleOpen11_4-1\" class=\"reference\"><a href=\"#cite_note-WoelfleOpen11-4\" rel=\"external_link\">[4]<\/a><\/sup>\n<\/p><p>It is clearer to classify open source and open science within a broader class of collaborative R&D, which can be defined as scalable collaboration (usually enabled by information technology) across organizational boundaries to solve R&D challenges.<sup id=\"rdp-ebb-cite_ref-RDICollab13Arch_8-0\" class=\"reference\"><a href=\"#cite_note-RDICollab13Arch-8\" rel=\"external_link\">[8]<\/a><\/sup>\n<\/p><p>Many approaches to open science and collaborative R&D have been tried.<sup id=\"rdp-ebb-cite_ref-NielsenReinventing11_1-2\" class=\"reference\"><a href=\"#cite_note-NielsenReinventing11-1\" rel=\"external_link\">[1]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-EkinsCollab11_9-0\" class=\"reference\"><a href=\"#cite_note-EkinsCollab11-9\" rel=\"external_link\">[9]<\/a><\/sup> The Gene Wiki has created over 10,000 Wikipedia articles, and aims to provide one for every notable human gene.<sup id=\"rdp-ebb-cite_ref-GoodTheGene12_10-0\" class=\"reference\"><a href=\"#cite_note-GoodTheGene12-10\" rel=\"external_link\">[10]<\/a><\/sup> The crowdsourcing platform InnoCentive has reportedly facilitated solutions to roughly half of the thousands of technical problems posed on the site, including many in life sciences such as the $1 million ALS Biomarker Prize.<sup id=\"rdp-ebb-cite_ref-BinghamTheOpen11_11-0\" class=\"reference\"><a href=\"#cite_note-BinghamTheOpen11-11\" rel=\"external_link\">[11]<\/a><\/sup> Other examples include prizes (X-Prize<sup id=\"rdp-ebb-cite_ref-WilsonPrizes11Arch_12-0\" class=\"reference\"><a href=\"#cite_note-WilsonPrizes11Arch-12\" rel=\"external_link\">[12]<\/a><\/sup>), scientific games (FoldIt<sup id=\"rdp-ebb-cite_ref-GoodGames11_13-0\" class=\"reference\"><a href=\"#cite_note-GoodGames11-13\" rel=\"external_link\">[13]<\/a><\/sup>), and licensing schemes inspired by open-source software (BIOS<sup id=\"rdp-ebb-cite_ref-JeffersonScience06_14-0\" class=\"reference\"><a href=\"#cite_note-JeffersonScience06-14\" rel=\"external_link\">[14]<\/a><\/sup>).\n<\/p><p>Collaborative R&D approaches vary in openness.<sup id=\"rdp-ebb-cite_ref-PLOSHowOpen13Arch_15-0\" class=\"reference\"><a href=\"#cite_note-PLOSHowOpen13Arch-15\" rel=\"external_link\">[15]<\/a><\/sup> In some approaches, the R&D process and outputs are open to all \u2014 for example, open-science projects like the Gene Wiki described above. In other approaches which demonstrate what might be called controlled collaboration, there are strong controls on who contributes and benefits \u2014 for example, computational platforms like Collaborative Drug Discovery or InnoCentive that support both commercial and nonprofit research.<sup id=\"rdp-ebb-cite_ref-EkinsCollab11_9-1\" class=\"reference\"><a href=\"#cite_note-EkinsCollab11-9\" rel=\"external_link\">[9]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-BinghamTheOpen11_11-1\" class=\"reference\"><a href=\"#cite_note-BinghamTheOpen11-11\" rel=\"external_link\">[11]<\/a><\/sup>\n<\/p><p>Collaborative approaches can unleash innovation from unforeseen sources, as with crowdsourcing health technologies.<sup id=\"rdp-ebb-cite_ref-BinghamTheOpen11_11-2\" class=\"reference\"><a href=\"#cite_note-BinghamTheOpen11-11\" rel=\"external_link\">[11]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-WilsonPrizes11Arch_12-1\" class=\"reference\"><a href=\"#cite_note-WilsonPrizes11Arch-12\" rel=\"external_link\">[12]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-GoodGames11_13-1\" class=\"reference\"><a href=\"#cite_note-GoodGames11-13\" rel=\"external_link\">[13]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-VashishtCrowd12_16-0\" class=\"reference\"><a href=\"#cite_note-VashishtCrowd12-16\" rel=\"external_link\">[16]<\/a><\/sup> They may help in global challenges like drug development<sup id=\"rdp-ebb-cite_ref-MunosHowTo11_17-0\" class=\"reference\"><a href=\"#cite_note-MunosHowTo11-17\" rel=\"external_link\">[17]<\/a><\/sup>, as with India's OSDD (Open Source Drug Discovery) project that recruited over 7,000 volunteers<sup id=\"rdp-ebb-cite_ref-VashishtCrowd12_16-1\" class=\"reference\"><a href=\"#cite_note-VashishtCrowd12-16\" rel=\"external_link\">[16]<\/a><\/sup> and an open-source drug synthesis project that improved an existing drug without increasing its cost.<sup id=\"rdp-ebb-cite_ref-WoelfleResolution11_18-0\" class=\"reference\"><a href=\"#cite_note-WoelfleResolution11-18\" rel=\"external_link\">[18]<\/a><\/sup>\n<\/p><p>If you want to apply open science and collaborative R&D, what principles are useful? We suggest 10 simple rules for cultivating open science and collaborative R&D. We also offer eight conversational interviews exploring life experiences that led to these rules (see Box 1 at end).\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_1:_Get_the_incentives_right_-_Learn_from_the_past\">Rule 1: Get the incentives right - Learn from the past<\/span><\/h2>\n<p>Why should contributors take part in your project? Learn from incentives that have worked in mass collaborations and open-source software, such as reputation building, enjoyment, cooperatively solving interesting problems that are too hard to do alone, and jointly developing tools that benefit all developers.<sup id=\"rdp-ebb-cite_ref-Prli.C4.87Ten12_6-1\" class=\"reference\"><a href=\"#cite_note-Prli.C4.87Ten12-6\" rel=\"external_link\">[6]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-FogelProducing13_7-1\" class=\"reference\"><a href=\"#cite_note-FogelProducing13-7\" rel=\"external_link\">[7]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-BenklerThePenguin11_19-0\" class=\"reference\"><a href=\"#cite_note-BenklerThePenguin11-19\" rel=\"external_link\">[19]<\/a><\/sup> Organizational incentives can include lowering costs, tapping external innovation, implementing novel business models such as selling complementary services, and jointly competing for public admiration or grant funding. Altruism can motivate collaboration, but frequently it is not the main reason.<sup id=\"rdp-ebb-cite_ref-EkinsCollab11_9-2\" class=\"reference\"><a href=\"#cite_note-EkinsCollab11-9\" rel=\"external_link\">[9]<\/a><\/sup> With this in mind, align individual incentives with collective benefit.<sup id=\"rdp-ebb-cite_ref-NielsenReinventing11_1-3\" class=\"reference\"><a href=\"#cite_note-NielsenReinventing11-1\" rel=\"external_link\">[1]<\/a><\/sup> Look to past and present precompetitive collaborations for ways to address intellectual property and competitive concerns.<sup id=\"rdp-ebb-cite_ref-IoMEstablishing11_3-1\" class=\"reference\"><a href=\"#cite_note-IoMEstablishing11-3\" rel=\"external_link\">[3]<\/a><\/sup> Share attribution with contributors so they can advance their goals and demonstrate their capabilities.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_2:_Make_your_controlled_collaborations_win-win-win\">Rule 2: Make your controlled collaborations win-win-win<\/span><\/h2>\n<p>Perhaps completely open science seems unsuitable to you, if for example you are engaged in market-driven R&D that must recoup investments. There are ways to benefit from open science and collaborative methods while retaining appropriate controls and the opportunity to provide public benefit. You, your partners, and the public can all benefit \u2014 a win-win-win situation. You might use computational platforms to supercharge information sharing with selected partners, including public-benefit initiatives that match your mission.<sup id=\"rdp-ebb-cite_ref-EkinsCollab11_9-3\" class=\"reference\"><a href=\"#cite_note-EkinsCollab11-9\" rel=\"external_link\">[9]<\/a><\/sup> You might use crowdsourcing to overcome roadblocks by opening up chosen parts of your R&D process to new innovators.<sup id=\"rdp-ebb-cite_ref-BinghamTheOpen11_11-3\" class=\"reference\"><a href=\"#cite_note-BinghamTheOpen11-11\" rel=\"external_link\">[11]<\/a><\/sup> Or you might make public selected data or software tools, exporting them to the open-source realm to gain from goodwill or quality improvement.<sup id=\"rdp-ebb-cite_ref-IoMEstablishing11_3-2\" class=\"reference\"><a href=\"#cite_note-IoMEstablishing11-3\" rel=\"external_link\">[3]<\/a><\/sup> Sharing can make both business and social sense, whether in implementing open standards, collaborating precompetitively, or reducing duplication of effort.<sup id=\"rdp-ebb-cite_ref-NormanLeveraging11_20-0\" class=\"reference\"><a href=\"#cite_note-NormanLeveraging11-20\" rel=\"external_link\">[20]<\/a><\/sup> Keep an eye open for opportunities to \"do well by doing good\" by structuring initiatives for private and public benefit.<sup id=\"rdp-ebb-cite_ref-KrattigerIntellectual07_21-0\" class=\"reference\"><a href=\"#cite_note-KrattigerIntellectual07-21\" rel=\"external_link\">[21]<\/a><\/sup> Collaborative approaches can benefit both public and private sectors in collaborating across competitive boundaries, connecting problems with problem solvers, and cultivating a knowledge commons.<sup id=\"rdp-ebb-cite_ref-NielsenReinventing11_1-4\" class=\"reference\"><a href=\"#cite_note-NielsenReinventing11-1\" rel=\"external_link\">[1]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-EkinsCollab11_9-4\" class=\"reference\"><a href=\"#cite_note-EkinsCollab11-9\" rel=\"external_link\">[9]<\/a><\/sup>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_3:_Understand_what_works_.E2.80.94_And_what_doesn.27t\">Rule 3: Understand what works \u2014 And what doesn't<\/span><\/h2>\n<p>You can save yourself frustration by not using an unsuitable collaborative method, be it a wiki without an audience or a crowdsourced research challenge without focus.<sup id=\"rdp-ebb-cite_ref-RDICollab13Arch_8-1\" class=\"reference\"><a href=\"#cite_note-RDICollab13Arch-8\" rel=\"external_link\">[8]<\/a><\/sup> Consider questions like: have you learned from others who have tried the method? Do you understand when the method fails, and what is necessary for it to work? Is there a good match between the method and your goals? Are you contributing your experiences and interesting failures back to the community, thus demonstrating thought leadership? If you are interested in more effective knowledge sharing, consider low-budget opportunities such as starting an online Q&A site about open science or collaborative R&D using a platform like StackExchange. There are also opportunities to help evaluate what really works\u2014moving beyond anecdotal evidence to case studies and metrics.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_4:_Lead_as_a_coach.2C_not_a_CEO\">Rule 4: Lead as a coach, not a CEO<\/span><\/h2>\n<p>The command-and-control style doesn't work well with contributors from diverse organizations, many of whom may be volunteers.<sup id=\"rdp-ebb-cite_ref-VicensTen07_22-0\" class=\"reference\"><a href=\"#cite_note-VicensTen07-22\" rel=\"external_link\">[22]<\/a><\/sup> And as has been said of Linus Torvalds, the founder of the open-source operating system Linux, \"Linus doesn't scale\": leaders of mass collaborations can become bottlenecks unless they encourage distributed workflows and leadership.<sup id=\"rdp-ebb-cite_ref-FogelProducing13_7-2\" class=\"reference\"><a href=\"#cite_note-FogelProducing13-7\" rel=\"external_link\">[7]<\/a><\/sup> Be flexible about management (but strict about quality). Check your ego at the door \u2014 you're playing a team game and will be stronger when others want to contribute. Participants will feel more motivated if their contribution enriches a joint resource rather than just the leader. Can you give up exclusive ownership and credit to achieve with others what you cannot achieve alone?\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_5:_Diversify_your_contributors\">Rule 5: Diversify your contributors<\/span><\/h2>\n<p>A powerful aspect of collaborative R&D is the potential diversity of the community \u2014 including students<sup id=\"rdp-ebb-cite_ref-VashishtCrowd12_16-2\" class=\"reference\"><a href=\"#cite_note-VashishtCrowd12-16\" rel=\"external_link\">[16]<\/a><\/sup>, patients<sup id=\"rdp-ebb-cite_ref-WicksAccelerated11_23-0\" class=\"reference\"><a href=\"#cite_note-WicksAccelerated11-23\" rel=\"external_link\">[23]<\/a><\/sup>, gamers<sup id=\"rdp-ebb-cite_ref-GoodTheGene12_10-1\" class=\"reference\"><a href=\"#cite_note-GoodTheGene12-10\" rel=\"external_link\">[10]<\/a><\/sup>, and researchers from lesser-known countries or institutions. You can use open science to attract diverse contributors by lowering barriers to participation, publicly tackling audacious challenges (see Rule 8), and making collaboration fun. Consider open licensing terms and joint or public ownership of selected outcomes to broaden your participant base.<sup id=\"rdp-ebb-cite_ref-JeffersonScience06_14-1\" class=\"reference\"><a href=\"#cite_note-JeffersonScience06-14\" rel=\"external_link\">[14]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-PLOSHowOpen13Arch_15-1\" class=\"reference\"><a href=\"#cite_note-PLOSHowOpen13Arch-15\" rel=\"external_link\">[15]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-KrattigerIntellectual07_21-1\" class=\"reference\"><a href=\"#cite_note-KrattigerIntellectual07-21\" rel=\"external_link\">[21]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-DykeDeveloping11_24-0\" class=\"reference\"><a href=\"#cite_note-DykeDeveloping11-24\" rel=\"external_link\">[24]<\/a><\/sup> Encourage all community members to find ways to contribute that suit their abilities and inclinations. Can you reach past your usual partners, and make it easy for others to get up to speed with what you're doing? Are there opportunities for \"citizen science,\" perhaps through organizing many microcontributions?<sup id=\"rdp-ebb-cite_ref-NielsenReinventing11_1-5\" class=\"reference\"><a href=\"#cite_note-NielsenReinventing11-1\" rel=\"external_link\">[1]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-GoodTheGene12_10-2\" class=\"reference\"><a href=\"#cite_note-GoodTheGene12-10\" rel=\"external_link\">[10]<\/a><\/sup>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_6:_Diversify_your_customers\">Rule 6: Diversify your customers<\/span><\/h2>\n<p>Can you engage the broadest possible base as beneficiaries? The science that you do in the open spreads its benefits widely, and that can attract unexpected accolades and collaborators.<sup id=\"rdp-ebb-cite_ref-NielsenReinventing11_1-6\" class=\"reference\"><a href=\"#cite_note-NielsenReinventing11-1\" rel=\"external_link\">[1]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-WoelfleOpen11_4-2\" class=\"reference\"><a href=\"#cite_note-WoelfleOpen11-4\" rel=\"external_link\">[4]<\/a><\/sup> Productively involving stakeholders can inform your research \u2014 for example, through participatory research strategies involving the people your efforts are meant to help.<sup id=\"rdp-ebb-cite_ref-HollandWho13_25-0\" class=\"reference\"><a href=\"#cite_note-HollandWho13-25\" rel=\"external_link\">[25]<\/a><\/sup> Contributing to collaborative initiatives targeting human development challenges can motivate your team, and potentially lead to innovations that are transferable to for-profit markets. Neglected disease R&D is a case in point, which seems particularly suitable for collaborative pilot projects, given its lower profits, humanitarian appeal, and need for new methods.<sup id=\"rdp-ebb-cite_ref-MasumOpen11_26-0\" class=\"reference\"><a href=\"#cite_note-MasumOpen11-26\" rel=\"external_link\">[26]<\/a><\/sup> If your work is commercially driven, consider humanitarian licensing approaches that encourage nonprofit applications by others to poorer demographics.<sup id=\"rdp-ebb-cite_ref-NRCDesigning11_2-1\" class=\"reference\"><a href=\"#cite_note-NRCDesigning11-2\" rel=\"external_link\">[2]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-KrattigerIntellectual07_21-2\" class=\"reference\"><a href=\"#cite_note-KrattigerIntellectual07-21\" rel=\"external_link\">[21]<\/a><\/sup>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_7:_Don.27t_reinvent_the_wheel\">Rule 7: Don't reinvent the wheel<\/span><\/h2>\n<p>The more you can use what already exists, the greater your effectiveness will be. Are there lab and computational resources that could be used when otherwise idle? Can you find people already working on elements of your problem, and organize their collective work? Before starting a new initiative, have you explored and considered joining existing ones? Piggybacking on active efforts eases prototyping and gathering enthusiastic initial users. Build on the cumulative stockpile of past open initiatives (see Rules 1 and 3).\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_8:_Think_big\">Rule 8: Think big<\/span><\/h2>\n<p>For projects hoping to harness the power of mass collaboration, a major challenge can be attracting a large community of contributors. Many of the best mass collaborations orient around seemingly audacious goals like: \"build a free encyclopedia of all the world's knowledge\" (Wikipedia), \"develop a review article for every human gene\" (Gene Wiki), and \"build a new operating system\" (Linux). Establishing a driving, high-level purpose will help spread the idea of your project and motivate people to come have a look and see what they can do. Be ready to scale with success.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_9:_Encourage_supportive_policies_and_tools\">Rule 9: Encourage supportive policies and tools<\/span><\/h2>\n<p>Can you cultivate open science and collaborative R&D by helping to make them part of \"standard operating procedure\"? For example, can you encourage institutional data sharing?<sup id=\"rdp-ebb-cite_ref-DykeDeveloping11_24-1\" class=\"reference\"><a href=\"#cite_note-DykeDeveloping11-24\" rel=\"external_link\">[24]<\/a><\/sup> Can you build a profiling platform of collaborative initiatives, summarizing what they have achieved and what types of collaborators they are seeking? Do you have opportunities to adopt appropriate policies in your own organization or field? A case study to learn from is the spread of open access from wishful thinking to widespread fact.<sup id=\"rdp-ebb-cite_ref-PLOSOpenAccess_5-1\" class=\"reference\"><a href=\"#cite_note-PLOSOpenAccess-5\" rel=\"external_link\">[5]<\/a><\/sup>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_10:_Grow_the_commons\">Rule 10: Grow the commons<\/span><\/h2>\n<p>As intellectual property debates illustrate, there are legitimate differences of opinion on how best to motivate innovators' investments to generate new knowledge.<sup id=\"rdp-ebb-cite_ref-KrattigerIntellectual07_21-3\" class=\"reference\"><a href=\"#cite_note-KrattigerIntellectual07-21\" rel=\"external_link\">[21]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-MasumOpen11_26-1\" class=\"reference\"><a href=\"#cite_note-MasumOpen11-26\" rel=\"external_link\">[26]<\/a><\/sup> But in the long run, sharing more knowledge and tools boosts both for-profit and nonprofit research.<sup id=\"rdp-ebb-cite_ref-NRCDesigning11_2-2\" class=\"reference\"><a href=\"#cite_note-NRCDesigning11-2\" rel=\"external_link\">[2]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-IoMEstablishing11_3-3\" class=\"reference\"><a href=\"#cite_note-IoMEstablishing11-3\" rel=\"external_link\">[3]<\/a><\/sup> This growing shared resource of knowledge and tools \u2014 \"the commons\" \u2014 is the product of centuries of striving. It depends on cumulative win-win-win collaborations spanning organizations, nations, and generations. Can you find ways to advance your interests while remaining part of this larger narrative?<sup id=\"rdp-ebb-cite_ref-NielsenReinventing11_1-7\" class=\"reference\"><a href=\"#cite_note-NielsenReinventing11-1\" rel=\"external_link\">[1]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-PLOSOpenAccess_5-2\" class=\"reference\"><a href=\"#cite_note-PLOSOpenAccess-5\" rel=\"external_link\">[5]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-BenklerThePenguin11_19-1\" class=\"reference\"><a href=\"#cite_note-BenklerThePenguin11-19\" rel=\"external_link\">[19]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-27\" class=\"reference\"><a href=\"#cite_note-27\" rel=\"external_link\">[27]<\/a><\/sup>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Supporting_information\">Supporting information<\/span><\/h2>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"10\"><b>Box 1.<\/b> Conversations on Open Science and Collaborative R&D\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"10\">Many commentators have considered challenges in translating open science and collaborative methods to biomedical research.<sup id=\"rdp-ebb-cite_ref-NRCDesigning11_2-3\" class=\"reference\"><a href=\"#cite_note-NRCDesigning11-2\" rel=\"external_link\">[2]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-IoMEstablishing11_3-4\" class=\"reference\"><a href=\"#cite_note-IoMEstablishing11-3\" rel=\"external_link\">[3]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-WoelfleOpen11_4-3\" class=\"reference\"><a href=\"#cite_note-WoelfleOpen11-4\" rel=\"external_link\">[4]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-EkinsCollab11_9-5\" class=\"reference\"><a href=\"#cite_note-EkinsCollab11-9\" rel=\"external_link\">[9]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-MunosHowTo11_17-1\" class=\"reference\"><a href=\"#cite_note-MunosHowTo11-17\" rel=\"external_link\">[17]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-NormanLeveraging11_20-1\" class=\"reference\"><a href=\"#cite_note-NormanLeveraging11-20\" rel=\"external_link\">[20]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-DykeDeveloping11_24-2\" class=\"reference\"><a href=\"#cite_note-DykeDeveloping11-24\" rel=\"external_link\">[24]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-MasumOpen11_26-2\" class=\"reference\"><a href=\"#cite_note-MasumOpen11-26\" rel=\"external_link\">[26]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-MardenOpenSource10_28-0\" class=\"reference\"><a href=\"#cite_note-MardenOpenSource10-28\" rel=\"external_link\">[28]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-.C3.85rdalOpenSource12_29-0\" class=\"reference\"><a href=\"#cite_note-.C3.85rdalOpenSource12-29\" rel=\"external_link\">[29]<\/a><\/sup> How can protecting intellectual property be balanced with freeing researchers to build on previous knowledge? If R&D results are collaboratively created and freely available, who will take responsibility for costly clinical trials and quality control? What will be the Linux of open-source R&D?\n<p>To explore such challenges and convey life experiences in biomedical open science and collaborative R&D, we offer eight conversational interviews by the first author of this article as supplementary material. The conversations were done on behalf of the Results for Development Institute and are with:\n<\/p>\n<ul><li> <b>Alph Bingham<\/b>, cofounder of InnoCentive; doi:10.1371\/journal.pcbi.1003244.s001 (<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/journals.plos.org\/ploscompbiol\/article\/file?type=supplementary&id=info:doi\/10.1371\/journal.pcbi.1003244.s001\" target=\"_blank\">Text S1<\/a>) (PDF)<\/li>\n<li> <b>Barry Bunin<\/b>, CEO of Collaborative Drug Discovery; doi:10.1371\/journal.pcbi.1003244.s002 (<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/journals.plos.org\/ploscompbiol\/article\/file?type=supplementary&id=info:doi\/10.1371\/journal.pcbi.1003244.s002\" target=\"_blank\">Text S2<\/a>) (PDF)<\/li>\n<li> <b>Leslie Chan<\/b>, open access pioneer and director of Bioline International; doi:10.1371\/journal.pcbi.1003244.s003 (<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/journals.plos.org\/ploscompbiol\/article\/file?type=supplementary&id=info:doi\/10.1371\/journal.pcbi.1003244.s003\" target=\"_blank\">Text S3<\/a>) (PDF)<\/li>\n<li> <b>Aled Edwards<\/b>, director of the Structural Genomics Consortium; doi:10.1371\/journal.pcbi.1003244.s004 (<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/journals.plos.org\/ploscompbiol\/article\/file?type=supplementary&id=info:doi\/10.1371\/journal.pcbi.1003244.s004\" target=\"_blank\">Text S4<\/a>) (PDF)<\/li>\n<li> <b>Benjamin Good<\/b>, coleader of the Gene Wiki initiative; doi:10.1371\/journal.pcbi.1003244.s005 (<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/journals.plos.org\/ploscompbiol\/article\/file?type=supplementary&id=info:doi\/10.1371\/journal.pcbi.1003244.s005\" target=\"_blank\">Text S5<\/a>) (PDF)<\/li>\n<li> <b>Bernard Munos<\/b>, pharmaceutical innovation thought leader; doi:10.1371\/journal.pcbi.1003244.s006 (<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/journals.plos.org\/ploscompbiol\/article\/file?type=supplementary&id=info:doi\/10.1371\/journal.pcbi.1003244.s006\" target=\"_blank\">Text S6<\/a>) (PDF)<\/li>\n<li> <b>Zakir Thomas<\/b>, director of India's Open Source Drug Discovery (OSDD) project; doi:10.1371\/journal.pcbi.1003244.s007 (<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/journals.plos.org\/ploscompbiol\/article\/file?type=supplementary&id=info:doi\/10.1371\/journal.pcbi.1003244.s007\" target=\"_blank\">Text S7<\/a>) (PDF)<\/li>\n<li> <b>Matt Todd<\/b>, open science and drug development pioneer; doi:10.1371\/journal.pcbi.1003244.s008 (<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/journals.plos.org\/ploscompbiol\/article\/file?type=supplementary&id=info:doi\/10.1371\/journal.pcbi.1003244.s008\" target=\"_blank\">Text S8<\/a>) (PDF)<\/li><\/ul>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h2>\n<p>We thank Jean Arkedis, Robert Hecht, and Paul Wilson for comments on early versions of this article. Our thanks also go to all the colleagues and pioneers who have shared their wisdom on making collaborative R&D work.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h3>\n<p>This article was made possible by support to HM and AR from a grant by the Bill & Melinda Gates Foundation to the Results for Development Institute. The funders had no role in the preparation of the manuscript.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h2>\n<p>The authors have declared that no competing interests exist.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-NielsenReinventing11-1\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-NielsenReinventing11_1-0\" rel=\"external_link\">1.0<\/a><\/sup> <sup><a href=\"#cite_ref-NielsenReinventing11_1-1\" rel=\"external_link\">1.1<\/a><\/sup> <sup><a href=\"#cite_ref-NielsenReinventing11_1-2\" rel=\"external_link\">1.2<\/a><\/sup> <sup><a href=\"#cite_ref-NielsenReinventing11_1-3\" rel=\"external_link\">1.3<\/a><\/sup> <sup><a href=\"#cite_ref-NielsenReinventing11_1-4\" rel=\"external_link\">1.4<\/a><\/sup> <sup><a href=\"#cite_ref-NielsenReinventing11_1-5\" rel=\"external_link\">1.5<\/a><\/sup> <sup><a href=\"#cite_ref-NielsenReinventing11_1-6\" rel=\"external_link\">1.6<\/a><\/sup> <sup><a href=\"#cite_ref-NielsenReinventing11_1-7\" rel=\"external_link\">1.7<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Nielsen, M. (2011). <i>Reinventing Discovery: The New Era of Networked Science<\/i>. Princeton University Press. pp. 272. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780691148908.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Reinventing+Discovery%3A+The+New+Era+of+Networked+Science&rft.aulast=Nielsen%2C+M.&rft.au=Nielsen%2C+M.&rft.date=2011&rft.pages=pp.%26nbsp%3B272&rft.pub=Princeton+University+Press&rft.isbn=9780691148908&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NRCDesigning11-2\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-NRCDesigning11_2-0\" rel=\"external_link\">2.0<\/a><\/sup> <sup><a href=\"#cite_ref-NRCDesigning11_2-1\" rel=\"external_link\">2.1<\/a><\/sup> <sup><a href=\"#cite_ref-NRCDesigning11_2-2\" rel=\"external_link\">2.2<\/a><\/sup> <sup><a href=\"#cite_ref-NRCDesigning11_2-3\" rel=\"external_link\">2.3<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">National Research Council (2011). Uhlir, P.F.. ed. <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.nap.edu\/catalog\/13245\/designing-the-microbial-research-commons-proceedings-of-an-international-symposium\" target=\"_blank\"><i>Designing the Microbial Research Commons: Proceedings of an International Symposium<\/i><\/a>. The National Academies Press. pp. 216. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780309219792<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.nap.edu\/catalog\/13245\/designing-the-microbial-research-commons-proceedings-of-an-international-symposium\" target=\"_blank\">https:\/\/www.nap.edu\/catalog\/13245\/designing-the-microbial-research-commons-proceedings-of-an-international-symposium<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Designing+the+Microbial+Research+Commons%3A+Proceedings+of+an+International+Symposium&rft.aulast=National+Research+Council&rft.au=National+Research+Council&rft.date=2011&rft.pages=pp.%26nbsp%3B216&rft.pub=The+National+Academies+Press&rft.isbn=9780309219792&rft_id=https%3A%2F%2Fwww.nap.edu%2Fcatalog%2F13245%2Fdesigning-the-microbial-research-commons-proceedings-of-an-international-symposium&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-IoMEstablishing11-3\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-IoMEstablishing11_3-0\" rel=\"external_link\">3.0<\/a><\/sup> <sup><a href=\"#cite_ref-IoMEstablishing11_3-1\" rel=\"external_link\">3.1<\/a><\/sup> <sup><a href=\"#cite_ref-IoMEstablishing11_3-2\" rel=\"external_link\">3.2<\/a><\/sup> <sup><a href=\"#cite_ref-IoMEstablishing11_3-3\" rel=\"external_link\">3.3<\/a><\/sup> <sup><a href=\"#cite_ref-IoMEstablishing11_3-4\" rel=\"external_link\">3.4<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Institute of Medicine; Olson, S.; Berger, A.C. (2011). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.nap.edu\/catalog\/13015\/establishing-precompetitive-collaborations-to-stimulate-genomics-driven-product-development-workshop\" target=\"_blank\"><i>Establishing Precompetitive Collaborations to Stimulate Genomics-Driven Product Development: Workshop Summary<\/i><\/a>. The National Academies Press. pp. 74. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780309161824<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.nap.edu\/catalog\/13015\/establishing-precompetitive-collaborations-to-stimulate-genomics-driven-product-development-workshop\" target=\"_blank\">https:\/\/www.nap.edu\/catalog\/13015\/establishing-precompetitive-collaborations-to-stimulate-genomics-driven-product-development-workshop<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Establishing+Precompetitive+Collaborations+to+Stimulate+Genomics-Driven+Product+Development%3A+Workshop+Summary&rft.aulast=Institute+of+Medicine%3B+Olson%2C+S.%3B+Berger%2C+A.C.&rft.au=Institute+of+Medicine%3B+Olson%2C+S.%3B+Berger%2C+A.C.&rft.date=2011&rft.pages=pp.%26nbsp%3B74&rft.pub=The+National+Academies+Press&rft.isbn=9780309161824&rft_id=https%3A%2F%2Fwww.nap.edu%2Fcatalog%2F13015%2Festablishing-precompetitive-collaborations-to-stimulate-genomics-driven-product-development-workshop&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WoelfleOpen11-4\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-WoelfleOpen11_4-0\" rel=\"external_link\">4.0<\/a><\/sup> <sup><a href=\"#cite_ref-WoelfleOpen11_4-1\" rel=\"external_link\">4.1<\/a><\/sup> <sup><a href=\"#cite_ref-WoelfleOpen11_4-2\" rel=\"external_link\">4.2<\/a><\/sup> <sup><a href=\"#cite_ref-WoelfleOpen11_4-3\" rel=\"external_link\">4.3<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Woelfle, M.; Olliaro, P.; Todd, M.H. (2011). \"Open science is a research accelerator\". <i>Nature Chemistry<\/i> <b>3<\/b> (10): 745-8. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnchem.1149\" target=\"_blank\">10.1038\/nchem.1149<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21941234\" target=\"_blank\">21941234<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Open+science+is+a+research+accelerator&rft.jtitle=Nature+Chemistry&rft.aulast=Woelfle%2C+M.%3B+Olliaro%2C+P.%3B+Todd%2C+M.H.&rft.au=Woelfle%2C+M.%3B+Olliaro%2C+P.%3B+Todd%2C+M.H.&rft.date=2011&rft.volume=3&rft.issue=10&rft.pages=745-8&rft_id=info:doi\/10.1038%2Fnchem.1149&rft_id=info:pmid\/21941234&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PLOSOpenAccess-5\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-PLOSOpenAccess_5-0\" rel=\"external_link\">5.0<\/a><\/sup> <sup><a href=\"#cite_ref-PLOSOpenAccess_5-1\" rel=\"external_link\">5.1<\/a><\/sup> <sup><a href=\"#cite_ref-PLOSOpenAccess_5-2\" rel=\"external_link\">5.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/web.archive.org\/web\/20130420203146\/http:\/\/www.ploscollections.org\/article\/browseIssue.action?issue=info:doi\/10.1371\/issue.pcol.v01.i10\" target=\"_blank\">\"PLOS Collections: Open Access Collection\"<\/a>. Public Library of Science. 2013. Archived from <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/collections.plos.org\/open-access\" target=\"_blank\">the original<\/a> on 20 April 2013<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/web.archive.org\/web\/20130420203146\/http:\/\/www.ploscollections.org\/article\/browseIssue.action?issue=info:doi\/10.1371\/issue.pcol.v01.i10\" target=\"_blank\">http:\/\/web.archive.org\/web\/20130420203146\/http:\/\/www.ploscollections.org\/article\/browseIssue.action?issue=info:doi\/10.1371\/issue.pcol.v01.i10<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 25 April 2013<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=PLOS+Collections%3A+Open+Access+Collection&rft.atitle=&rft.date=2013&rft.pub=Public+Library+of+Science&rft_id=http%3A%2F%2Fweb.archive.org%2Fweb%2F20130420203146%2Fhttp%3A%2F%2Fwww.ploscollections.org%2Farticle%2FbrowseIssue.action%3Fissue%3Dinfo%3Adoi%2F10.1371%2Fissue.pcol.v01.i10&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Prli.C4.87Ten12-6\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-Prli.C4.87Ten12_6-0\" rel=\"external_link\">6.0<\/a><\/sup> <sup><a href=\"#cite_ref-Prli.C4.87Ten12_6-1\" rel=\"external_link\">6.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Prli\u0107, A.; Procter, J.B. (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3516539\" target=\"_blank\">\"Ten simple rules for the open development of scientific software\"<\/a>. <i>PLOS Computational Biology<\/i> <b>8<\/b> (12): e1002802. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.1002802\" target=\"_blank\">10.1371\/journal.pcbi.1002802<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3516539\/\" target=\"_blank\">PMC3516539<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23236269\" target=\"_blank\">23236269<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3516539\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3516539<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ten+simple+rules+for+the+open+development+of+scientific+software&rft.jtitle=PLOS+Computational+Biology&rft.aulast=Prli%C4%87%2C+A.%3B+Procter%2C+J.B.&rft.au=Prli%C4%87%2C+A.%3B+Procter%2C+J.B.&rft.date=2012&rft.volume=8&rft.issue=12&rft.pages=e1002802&rft_id=info:doi\/10.1371%2Fjournal.pcbi.1002802&rft_id=info:pmc\/PMC3516539&rft_id=info:pmid\/23236269&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3516539&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-FogelProducing13-7\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-FogelProducing13_7-0\" rel=\"external_link\">7.0<\/a><\/sup> <sup><a href=\"#cite_ref-FogelProducing13_7-1\" rel=\"external_link\">7.1<\/a><\/sup> <sup><a href=\"#cite_ref-FogelProducing13_7-2\" rel=\"external_link\">7.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Fogel, K. (2013). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/producingoss.com\/en\/\" target=\"_blank\">\"Producing Open Source Software: How to Run a Successful Free Software Project\"<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/producingoss.com\/en\/\" target=\"_blank\">http:\/\/producingoss.com\/en\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 25 April 2013<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Producing+Open+Source+Software%3A+How+to+Run+a+Successful+Free+Software+Project&rft.atitle=&rft.aulast=Fogel%2C+K.&rft.au=Fogel%2C+K.&rft.date=2013&rft_id=http%3A%2F%2Fproducingoss.com%2Fen%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RDICollab13Arch-8\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-RDICollab13Arch_8-0\" rel=\"external_link\">8.0<\/a><\/sup> <sup><a href=\"#cite_ref-RDICollab13Arch_8-1\" rel=\"external_link\">8.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/web.archive.org\/web\/20130115194036\/http:\/\/healthresearchpolicy.org\/primer\" target=\"_blank\">\"Collaborative Health R&D Primer\"<\/a>. <i>Global Health R&D Policy Assessment Center<\/i>. Results for Development Institute. 2013. Archived from <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/healthresearchpolicy.org\/primer\/\" target=\"_blank\">the original<\/a> on 15 January 2013<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/web.archive.org\/web\/20130115194036\/http:\/\/healthresearchpolicy.org\/primer\" target=\"_blank\">http:\/\/web.archive.org\/web\/20130115194036\/http:\/\/healthresearchpolicy.org\/primer<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 25 April 2013<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Collaborative+Health+R%26D+Primer&rft.atitle=Global+Health+R%26D+Policy+Assessment+Center&rft.date=2013&rft.pub=Results+for+Development+Institute&rft_id=http%3A%2F%2Fweb.archive.org%2Fweb%2F20130115194036%2Fhttp%3A%2F%2Fhealthresearchpolicy.org%2Fprimer&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-EkinsCollab11-9\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-EkinsCollab11_9-0\" rel=\"external_link\">9.0<\/a><\/sup> <sup><a href=\"#cite_ref-EkinsCollab11_9-1\" rel=\"external_link\">9.1<\/a><\/sup> <sup><a href=\"#cite_ref-EkinsCollab11_9-2\" rel=\"external_link\">9.2<\/a><\/sup> <sup><a href=\"#cite_ref-EkinsCollab11_9-3\" rel=\"external_link\">9.3<\/a><\/sup> <sup><a href=\"#cite_ref-EkinsCollab11_9-4\" rel=\"external_link\">9.4<\/a><\/sup> <sup><a href=\"#cite_ref-EkinsCollab11_9-5\" rel=\"external_link\">9.5<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Ekins, S.; Hupcey, M.A.Z.; Williams, A.J., ed. (2011). <i>Collaborative Computational Technologies for Biomedical Research<\/i>. John Wiley & Sons, Inc. pp. 576. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780470638033.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Collaborative+Computational+Technologies+for+Biomedical+Research&rft.date=2011&rft.pages=pp.%26nbsp%3B576&rft.pub=John+Wiley+%26+Sons%2C+Inc&rft.isbn=9780470638033&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GoodTheGene12-10\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-GoodTheGene12_10-0\" rel=\"external_link\">10.0<\/a><\/sup> <sup><a href=\"#cite_ref-GoodTheGene12_10-1\" rel=\"external_link\">10.1<\/a><\/sup> <sup><a href=\"#cite_ref-GoodTheGene12_10-2\" rel=\"external_link\">10.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Good, B.M.; Clarke, E.L.; de Alfaro, L.; Su, A.I. (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3245148\" target=\"_blank\">\"The Gene Wiki in 2011: Community intelligence applied to human gene annotation\"<\/a>. <i>Nucleic Acids Research<\/i> <b>40<\/b> (D1): D1255-61. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1093%2Fnar%2Fgkr925\" target=\"_blank\">10.1093\/nar\/gkr925<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3245148\/\" target=\"_blank\">PMC3245148<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22075991\" target=\"_blank\">22075991<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3245148\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3245148<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Gene+Wiki+in+2011%3A+Community+intelligence+applied+to+human+gene+annotation&rft.jtitle=Nucleic+Acids+Research&rft.aulast=Good%2C+B.M.%3B+Clarke%2C+E.L.%3B+de+Alfaro%2C+L.%3B+Su%2C+A.I.&rft.au=Good%2C+B.M.%3B+Clarke%2C+E.L.%3B+de+Alfaro%2C+L.%3B+Su%2C+A.I.&rft.date=2012&rft.volume=40&rft.issue=D1&rft.pages=D1255-61&rft_id=info:doi\/10.1093%2Fnar%2Fgkr925&rft_id=info:pmc\/PMC3245148&rft_id=info:pmid\/22075991&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3245148&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BinghamTheOpen11-11\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BinghamTheOpen11_11-0\" rel=\"external_link\">11.0<\/a><\/sup> <sup><a href=\"#cite_ref-BinghamTheOpen11_11-1\" rel=\"external_link\">11.1<\/a><\/sup> <sup><a href=\"#cite_ref-BinghamTheOpen11_11-2\" rel=\"external_link\">11.2<\/a><\/sup> <sup><a href=\"#cite_ref-BinghamTheOpen11_11-3\" rel=\"external_link\">11.3<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Bingham, A.; Spradlin, D. (2011). <i>The Open Innovation Marketplace: Creating Value in the Challenge Driven Enterprise<\/i>. FT Press. pp. 272. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780132311830.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=The+Open+Innovation+Marketplace%3A+Creating+Value+in+the+Challenge+Driven+Enterprise&rft.aulast=Bingham%2C+A.%3B+Spradlin%2C+D.&rft.au=Bingham%2C+A.%3B+Spradlin%2C+D.&rft.date=2011&rft.pages=pp.%26nbsp%3B272&rft.pub=FT+Press&rft.isbn=9780132311830&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WilsonPrizes11Arch-12\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-WilsonPrizes11Arch_12-0\" rel=\"external_link\">12.0<\/a><\/sup> <sup><a href=\"#cite_ref-WilsonPrizes11Arch_12-1\" rel=\"external_link\">12.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Wilson, P.; Palriwala, A. (2011). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/web.archive.org\/web\/20121107025448\/http:\/\/healthresearchpolicy.org\/assessments\/prizes-global-health-technologies\" target=\"_blank\">\"Prizes for Global Health Technologies\"<\/a>. <i>Global Health R&D Policy Assessment Center<\/i>. Results for Development Institute. Archived from <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/healthresearchpolicy.org\/assessments\/prizes-global-health-technologies\" target=\"_blank\">the original<\/a> on 07 November 2012<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/web.archive.org\/web\/20121107025448\/http:\/\/healthresearchpolicy.org\/assessments\/prizes-global-health-technologies\" target=\"_blank\">http:\/\/web.archive.org\/web\/20121107025448\/http:\/\/healthresearchpolicy.org\/assessments\/prizes-global-health-technologies<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 25 April 2013<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Prizes+for+Global+Health+Technologies&rft.atitle=Global+Health+R%26D+Policy+Assessment+Center&rft.aulast=Wilson%2C+P.%3B+Palriwala%2C+A.&rft.au=Wilson%2C+P.%3B+Palriwala%2C+A.&rft.date=2011&rft.pub=Results+for+Development+Institute&rft_id=http%3A%2F%2Fweb.archive.org%2Fweb%2F20121107025448%2Fhttp%3A%2F%2Fhealthresearchpolicy.org%2Fassessments%2Fprizes-global-health-technologies&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GoodGames11-13\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-GoodGames11_13-0\" rel=\"external_link\">13.0<\/a><\/sup> <sup><a href=\"#cite_ref-GoodGames11_13-1\" rel=\"external_link\">13.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Good, B.M.; Su, A.I. (2011). \"Games with a scientific purpose\". <i>Genome Biology<\/i> <b>12<\/b> (12): 135. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1186%2Fgb-2011-12-12-135\" target=\"_blank\">10.1186\/gb-2011-12-12-135<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22204700\" target=\"_blank\">22204700<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Games+with+a+scientific+purpose&rft.jtitle=Genome+Biology&rft.aulast=Good%2C+B.M.%3B+Su%2C+A.I.&rft.au=Good%2C+B.M.%3B+Su%2C+A.I.&rft.date=2011&rft.volume=12&rft.issue=12&rft.pages=135&rft_id=info:doi\/10.1186%2Fgb-2011-12-12-135&rft_id=info:pmid\/22204700&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-JeffersonScience06-14\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-JeffersonScience06_14-0\" rel=\"external_link\">14.0<\/a><\/sup> <sup><a href=\"#cite_ref-JeffersonScience06_14-1\" rel=\"external_link\">14.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Jefferson, R. (2006). \"Science as social enterprise: The CAMBIA BiOS Initiative\". <i>Innovations: Technology, Governance, Globalization<\/i> <b>1<\/b> (4): 13\u201344. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1162%2Fitgg.2006.1.4.13\" target=\"_blank\">10.1162\/itgg.2006.1.4.13<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Science+as+social+enterprise%3A+The+CAMBIA+BiOS+Initiative&rft.jtitle=Innovations%3A+Technology%2C+Governance%2C+Globalization&rft.aulast=Jefferson%2C+R.&rft.au=Jefferson%2C+R.&rft.date=2006&rft.volume=1&rft.issue=4&rft.pages=13%E2%80%9344&rft_id=info:doi\/10.1162%2Fitgg.2006.1.4.13&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PLOSHowOpen13Arch-15\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-PLOSHowOpen13Arch_15-0\" rel=\"external_link\">15.0<\/a><\/sup> <sup><a href=\"#cite_ref-PLOSHowOpen13Arch_15-1\" rel=\"external_link\">15.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/web.archive.org\/web\/20130301193758\/http:\/\/www.plos.org\/about\/open-access\/howopenisit\/\" target=\"_blank\">\"HowOpenIsIt?\"<\/a>. Public Library of Science. 2013. Archived from <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.plos.org\/how-open-is-it\" target=\"_blank\">the original<\/a> on 01 March 2013<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/web.archive.org\/web\/20130301193758\/http:\/\/www.plos.org\/about\/open-access\/howopenisit\/\" target=\"_blank\">http:\/\/web.archive.org\/web\/20130301193758\/http:\/\/www.plos.org\/about\/open-access\/howopenisit\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 25 April 2013<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=HowOpenIsIt%3F&rft.atitle=&rft.date=2013&rft.pub=Public+Library+of+Science&rft_id=http%3A%2F%2Fweb.archive.org%2Fweb%2F20130301193758%2Fhttp%3A%2F%2Fwww.plos.org%2Fabout%2Fopen-access%2Fhowopenisit%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-VashishtCrowd12-16\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-VashishtCrowd12_16-0\" rel=\"external_link\">16.0<\/a><\/sup> <sup><a href=\"#cite_ref-VashishtCrowd12_16-1\" rel=\"external_link\">16.1<\/a><\/sup> <sup><a href=\"#cite_ref-VashishtCrowd12_16-2\" rel=\"external_link\">16.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Vashisht, R.; Mondal, A.K.; Jain, A. et al. (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3395720\" target=\"_blank\">\"Crowd sourcing a new paradigm for interactome driven drug target identification in <i>Mycobacterium tuberculosis<\/i>\"<\/a>. <i>PLOS One<\/i> <b>7<\/b> (7): e39808. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pone.0039808\" target=\"_blank\">10.1371\/journal.pone.0039808<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3395720\/\" target=\"_blank\">PMC3395720<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22808064\" target=\"_blank\">22808064<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3395720\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3395720<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Crowd+sourcing+a+new+paradigm+for+interactome+driven+drug+target+identification+in+%27%27Mycobacterium+tuberculosis%27%27&rft.jtitle=PLOS+One&rft.aulast=Vashisht%2C+R.%3B+Mondal%2C+A.K.%3B+Jain%2C+A.+et+al.&rft.au=Vashisht%2C+R.%3B+Mondal%2C+A.K.%3B+Jain%2C+A.+et+al.&rft.date=2012&rft.volume=7&rft.issue=7&rft.pages=e39808&rft_id=info:doi\/10.1371%2Fjournal.pone.0039808&rft_id=info:pmc\/PMC3395720&rft_id=info:pmid\/22808064&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3395720&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MunosHowTo11-17\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MunosHowTo11_17-0\" rel=\"external_link\">17.0<\/a><\/sup> <sup><a href=\"#cite_ref-MunosHowTo11_17-1\" rel=\"external_link\">17.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Munos, B.H.; Chin, W.W. (2011). \"How to revive breakthrough innovation in the pharmaceutical industry\". <i>Science Translational Medicine<\/i> <b>3<\/b> (89): 89cm16. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1126%2Fscitranslmed.3002273\" target=\"_blank\">10.1126\/scitranslmed.3002273<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21715677\" target=\"_blank\">21715677<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=How+to+revive+breakthrough+innovation+in+the+pharmaceutical+industry&rft.jtitle=Science+Translational+Medicine&rft.aulast=Munos%2C+B.H.%3B+Chin%2C+W.W.&rft.au=Munos%2C+B.H.%3B+Chin%2C+W.W.&rft.date=2011&rft.volume=3&rft.issue=89&rft.pages=89cm16&rft_id=info:doi\/10.1126%2Fscitranslmed.3002273&rft_id=info:pmid\/21715677&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WoelfleResolution11-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WoelfleResolution11_18-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Woelfle, M.; Seerden, J.P.; de Gooijer, J. et al. (2011). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3176743\" target=\"_blank\">\"Resolution of praziquantel\"<\/a>. <i>PLOS Neglected Tropical Diseases<\/i> <b>5<\/b> (9): e1260. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pntd.0001260\" target=\"_blank\">10.1371\/journal.pntd.0001260<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3176743\/\" target=\"_blank\">PMC3176743<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21949890\" target=\"_blank\">21949890<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3176743\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3176743<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Resolution+of+praziquantel&rft.jtitle=PLOS+Neglected+Tropical+Diseases&rft.aulast=Woelfle%2C+M.%3B+Seerden%2C+J.P.%3B+de+Gooijer%2C+J.+et+al.&rft.au=Woelfle%2C+M.%3B+Seerden%2C+J.P.%3B+de+Gooijer%2C+J.+et+al.&rft.date=2011&rft.volume=5&rft.issue=9&rft.pages=e1260&rft_id=info:doi\/10.1371%2Fjournal.pntd.0001260&rft_id=info:pmc\/PMC3176743&rft_id=info:pmid\/21949890&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3176743&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BenklerThePenguin11-19\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-BenklerThePenguin11_19-0\" rel=\"external_link\">19.0<\/a><\/sup> <sup><a href=\"#cite_ref-BenklerThePenguin11_19-1\" rel=\"external_link\">19.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Benkler, Y. (2011). <i>The Penguin and the Leviathan: How Cooperation Triumphs over Self-Interest<\/i>. Crown Business. pp. 272. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9780385525763.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=The+Penguin+and+the+Leviathan%3A+How+Cooperation+Triumphs+over+Self-Interest&rft.aulast=Benkler%2C+Y.&rft.au=Benkler%2C+Y.&rft.date=2011&rft.pages=pp.%26nbsp%3B272&rft.pub=Crown+Business&rft.isbn=9780385525763&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NormanLeveraging11-20\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-NormanLeveraging11_20-0\" rel=\"external_link\">20.0<\/a><\/sup> <sup><a href=\"#cite_ref-NormanLeveraging11_20-1\" rel=\"external_link\">20.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Norman, T.C.; Bountra, C.; Edwards, A.M. etc. (2011). \"Leveraging crowdsourcing to facilitate the discovery of new medicines\". <i>Science Translational Medicine<\/i> <b>3<\/b> (88): 88mr1. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1126%2Fscitranslmed.3002678\" target=\"_blank\">10.1126\/scitranslmed.3002678<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21697527\" target=\"_blank\">21697527<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Leveraging+crowdsourcing+to+facilitate+the+discovery+of+new+medicines&rft.jtitle=Science+Translational+Medicine&rft.aulast=Norman%2C+T.C.%3B+Bountra%2C+C.%3B+Edwards%2C+A.M.+etc.&rft.au=Norman%2C+T.C.%3B+Bountra%2C+C.%3B+Edwards%2C+A.M.+etc.&rft.date=2011&rft.volume=3&rft.issue=88&rft.pages=88mr1&rft_id=info:doi\/10.1126%2Fscitranslmed.3002678&rft_id=info:pmid\/21697527&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KrattigerIntellectual07-21\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-KrattigerIntellectual07_21-0\" rel=\"external_link\">21.0<\/a><\/sup> <sup><a href=\"#cite_ref-KrattigerIntellectual07_21-1\" rel=\"external_link\">21.1<\/a><\/sup> <sup><a href=\"#cite_ref-KrattigerIntellectual07_21-2\" rel=\"external_link\">21.2<\/a><\/sup> <sup><a href=\"#cite_ref-KrattigerIntellectual07_21-3\" rel=\"external_link\">21.3<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation book\">Krattiger, A.; Mahoney, R.T.; Nelsen, L. et al., ed. (2007). <i>Intellectual Property Management in Health and Agricultural Innovation: A Handbook of Best Practices<\/i>. <b>1<\/b>. MIHR-USA. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9781424320264.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Intellectual+Property+Management+in+Health+and+Agricultural+Innovation%3A+A+Handbook+of+Best+Practices&rft.date=2007&rft.volume=1&rft.pub=MIHR-USA&rft.isbn=9781424320264&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-VicensTen07-22\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-VicensTen07_22-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Vicens, Q.; Bourne, P.E. (2007). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1847992\" target=\"_blank\">\"Ten simple rules for a successful collaboration\"<\/a>. <i>PLOS Computational Biology<\/i> <b>3<\/b> (3): e44. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.0030044\" target=\"_blank\">10.1371\/journal.pcbi.0030044<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC1847992\/\" target=\"_blank\">PMC1847992<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/17397252\" target=\"_blank\">17397252<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1847992\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1847992<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ten+simple+rules+for+a+successful+collaboration&rft.jtitle=PLOS+Computational+Biology&rft.aulast=Vicens%2C+Q.%3B+Bourne%2C+P.E.&rft.au=Vicens%2C+Q.%3B+Bourne%2C+P.E.&rft.date=2007&rft.volume=3&rft.issue=3&rft.pages=e44&rft_id=info:doi\/10.1371%2Fjournal.pcbi.0030044&rft_id=info:pmc\/PMC1847992&rft_id=info:pmid\/17397252&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC1847992&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WicksAccelerated11-23\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WicksAccelerated11_23-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Wicks, P.; Vaughan, T.E.; Massagli, M.P.; Heywood, J. (2011). \"Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm\". <i>Nature Biotechnology<\/i> <b>29<\/b> (5): 411-4. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnbt.1837\" target=\"_blank\">10.1038\/nbt.1837<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21516084\" target=\"_blank\">21516084<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Accelerated+clinical+discovery+using+self-reported+patient+data+collected+online+and+a+patient-matching+algorithm&rft.jtitle=Nature+Biotechnology&rft.aulast=Wicks%2C+P.%3B+Vaughan%2C+T.E.%3B+Massagli%2C+M.P.%3B+Heywood%2C+J.&rft.au=Wicks%2C+P.%3B+Vaughan%2C+T.E.%3B+Massagli%2C+M.P.%3B+Heywood%2C+J.&rft.date=2011&rft.volume=29&rft.issue=5&rft.pages=411-4&rft_id=info:doi\/10.1038%2Fnbt.1837&rft_id=info:pmid\/21516084&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DykeDeveloping11-24\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-DykeDeveloping11_24-0\" rel=\"external_link\">24.0<\/a><\/sup> <sup><a href=\"#cite_ref-DykeDeveloping11_24-1\" rel=\"external_link\">24.1<\/a><\/sup> <sup><a href=\"#cite_ref-DykeDeveloping11_24-2\" rel=\"external_link\">24.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Dyke, S.O.; Hubbard, T.J. (2011). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3239235\" target=\"_blank\">\"Developing and implementing an institute-wide data sharing policy\"<\/a>. <i>Genome Medicine<\/i> <b>3<\/b> (9): 60. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1186%2Fgm276\" target=\"_blank\">10.1186\/gm276<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3239235\/\" target=\"_blank\">PMC3239235<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21955348\" target=\"_blank\">21955348<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3239235\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3239235<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Developing+and+implementing+an+institute-wide+data+sharing+policy&rft.jtitle=Genome+Medicine&rft.aulast=Dyke%2C+S.O.%3B+Hubbard%2C+T.J.&rft.au=Dyke%2C+S.O.%3B+Hubbard%2C+T.J.&rft.date=2011&rft.volume=3&rft.issue=9&rft.pages=60&rft_id=info:doi\/10.1186%2Fgm276&rft_id=info:pmc\/PMC3239235&rft_id=info:pmid\/21955348&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3239235&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HollandWho13-25\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HollandWho13_25-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Holland, J.; Chambers, R. (2013). <i>Who Counts?: The Power of Participatory Statistics<\/i>. Practical Action. pp. 220. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9781853397721.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Who+Counts%3F%3A+The+Power+of+Participatory+Statistics&rft.aulast=Holland%2C+J.%3B+Chambers%2C+R.&rft.au=Holland%2C+J.%3B+Chambers%2C+R.&rft.date=2013&rft.pages=pp.%26nbsp%3B220&rft.pub=Practical+Action&rft.isbn=9781853397721&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MasumOpen11-26\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MasumOpen11_26-0\" rel=\"external_link\">26.0<\/a><\/sup> <sup><a href=\"#cite_ref-MasumOpen11_26-1\" rel=\"external_link\">26.1<\/a><\/sup> <sup><a href=\"#cite_ref-MasumOpen11_26-2\" rel=\"external_link\">26.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Masum, H.; Harris, R. (2011). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/web.archive.org\/web\/20130106064334\/http:\/\/healthresearchpolicy.org\/assessments\/open-source-neglected-diseases-magic-bullet-or-mirage\" target=\"_blank\">\"Open Source for Neglected Diseases: Magic Bullet or Mirage?\"<\/a>. <i>Global Health R&D Policy Assessment Center<\/i>. Results for Development Institute. Archived from <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/healthresearchpolicy.org\/assessments\/open-source-neglected-diseases-magic-bullet-or-mirage\" target=\"_blank\">the original<\/a> on 06 January 2013<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/web.archive.org\/web\/20130106064334\/http:\/\/healthresearchpolicy.org\/assessments\/open-source-neglected-diseases-magic-bullet-or-mirage\" target=\"_blank\">http:\/\/web.archive.org\/web\/20130106064334\/http:\/\/healthresearchpolicy.org\/assessments\/open-source-neglected-diseases-magic-bullet-or-mirage<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 25 April 2013<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Open+Source+for+Neglected+Diseases%3A+Magic+Bullet+or+Mirage%3F&rft.atitle=Global+Health+R%26D+Policy+Assessment+Center&rft.aulast=Masum%2C+H.%3B+Harris%2C+R.&rft.au=Masum%2C+H.%3B+Harris%2C+R.&rft.date=2011&rft.pub=Results+for+Development+Institute&rft_id=http%3A%2F%2Fweb.archive.org%2Fweb%2F20130106064334%2Fhttp%3A%2F%2Fhealthresearchpolicy.org%2Fassessments%2Fopen-source-neglected-diseases-magic-bullet-or-mirage&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-27\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-27\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Masum, H.; Tovey, M. (2006). \"Given enough minds...: Bridging the ingenuity gap\". <i>First Monday<\/i> <b>11<\/b> (7). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.5210%2Ffm.v11i7.1370\" target=\"_blank\">10.5210\/fm.v11i7.1370<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Given+enough+minds...%3A+Bridging+the+ingenuity+gap&rft.jtitle=First+Monday&rft.aulast=Masum%2C+H.%3B+Tovey%2C+M.&rft.au=Masum%2C+H.%3B+Tovey%2C+M.&rft.date=2006&rft.volume=11&rft.issue=7&rft_id=info:doi\/10.5210%2Ffm.v11i7.1370&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MardenOpenSource10-28\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MardenOpenSource10_28-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Marden, E. (2010). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/hdl.handle.net\/11299\/155748\" target=\"_blank\">\"Open source drug development: A path to more accessible drugs and diagnostics?\"<\/a>. <i>Minnesota Journal of Law, Science & Technology<\/i> <b>11<\/b> (1): 217\u2013266<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/hdl.handle.net\/11299\/155748\" target=\"_blank\">http:\/\/hdl.handle.net\/11299\/155748<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Open+source+drug+development%3A+A+path+to+more+accessible+drugs+and+diagnostics%3F&rft.jtitle=Minnesota+Journal+of+Law%2C+Science+%26+Technology&rft.aulast=Marden%2C+E.&rft.au=Marden%2C+E.&rft.date=2010&rft.volume=11&rft.issue=1&rft.pages=217%E2%80%93266&rft_id=http%3A%2F%2Fhdl.handle.net%2F11299%2F155748&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-.C3.85rdalOpenSource12-29\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-.C3.85rdalOpenSource12_29-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">\u00c5rdal, C.; R\u00f8ttingen, J.A. (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3447952\" target=\"_blank\">\"Open source drug discovery in practice: A case study\"<\/a>. <i>PLOS Neglected Tropical Diseases<\/i> <b>6<\/b> (9): e1827. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pntd.0001827\" target=\"_blank\">10.1371\/journal.pntd.0001827<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3447952\/\" target=\"_blank\">PMC3447952<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23029588\" target=\"_blank\">23029588<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3447952\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3447952<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Open+source+drug+discovery+in+practice%3A+A+case+study&rft.jtitle=PLOS+Neglected+Tropical+Diseases&rft.aulast=%C3%85rdal%2C+C.%3B+R%C3%B8ttingen%2C+J.A.&rft.au=%C3%85rdal%2C+C.%3B+R%C3%B8ttingen%2C+J.A.&rft.date=2012&rft.volume=6&rft.issue=9&rft.pages=e1827&rft_id=info:doi\/10.1371%2Fjournal.pntd.0001827&rft_id=info:pmc\/PMC3447952&rft_id=info:pmid\/23029588&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3447952&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. In a few cases, the URLs from 2013 were dead; they were updated with current URLs, and, when applicable, archived URLs from the Internet Archive. Box 1, which in the original appeared at top, has been combined with the supporting information at the bottom.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20181214191058\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.737 seconds\nReal time usage: 0.761 seconds\nPreprocessor visited node count: 23621\/1000000\nPreprocessor generated node count: 39284\/1000000\nPost\u2010expand include size: 180215\/2097152 bytes\nTemplate argument size: 64185\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 743.381 1 - -total\n 79.65% 592.119 1 - Template:Reflist\n 68.67% 510.488 29 - Template:Citation\/core\n 38.96% 289.655 15 - Template:Cite_journal\n 21.13% 157.092 8 - Template:Cite_book\n 12.80% 95.136 6 - Template:Cite_web\n 11.55% 85.842 1 - Template:Infobox_journal_article\n 11.17% 83.017 1 - Template:Infobox\n 7.30% 54.238 41 - Template:Citation\/identifier\n 6.44% 47.872 80 - Template:Infobox\/row\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:9974-0!*!*!!en!*!* and timestamp 20181214191058 and revision id 29582\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D\">https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","b1b2d2922d12d6afbd23ca5f216a0cd7_images":[],"b1b2d2922d12d6afbd23ca5f216a0cd7_timestamp":1544814658,"e80be5db806b508a1aecd418f32667db_type":"article","e80be5db806b508a1aecd418f32667db_title":"Data and metadata brokering \u2013 Theory and practice from the BCube Project (Khalsa 2017)","e80be5db806b508a1aecd418f32667db_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project","e80be5db806b508a1aecd418f32667db_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Data and metadata brokering \u2013 Theory and practice from the BCube Project\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nData and metadata brokering \u2013 Theory and practice from the BCube ProjectJournal\n \nData Science JournalAuthor(s)\n \nKhalsa, Siri Jodha SinghAuthor affiliation(s)\n \nUniversity of ColoradoPrimary contact\n \nEmail: sjsk at nsidc dor orgYear published\n \n2017Volume and issue\n \n16(1)Page(s)\n \n1DOI\n \n10.5334\/dsj-2017-001ISSN\n \n1683-1470Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttp:\/\/datascience.codata.org\/articles\/10.5334\/dsj-2017-001\/Download\n \nhttp:\/\/datascience.codata.org\/articles\/10.5334\/dsj-2017-001\/galley\/620\/download\/ (PDF)\n\nContents\n\n1 Abstract \n2 Genesis and objectives of EarthCube \n3 The nature of infrastructure development \n4 The challenge of cross-disciplinary interoperability \n5 The BCube brokering framework \n6 Science scenarios \n7 Metadata brokering \n8 Data brokering \n9 Sustainability \n10 Lessons learned \n11 Conclusions \n12 Footnotes \n13 Acknowledgements \n14 Competing interests \n15 References \n16 Notes \n\n\n\nAbstract \nEarthCube is a U.S. National Science Foundation initiative that aims to create a cyberinfrastructure (CI) for all the geosciences. An initial set of \"building blocks\" was funded to develop potential components of that CI. The Brokering Building Block (BCube) created a brokering framework to demonstrate cross-disciplinary data access based on a set of use cases developed by scientists from the domains of hydrology, oceanography, polar science and climate\/weather. While some successes were achieved, considerable challenges were encountered. We present a synopsis of the processes and outcomes of the BCube experiment.\nKeywords: interoperability, brokering, middleware, EarthCube, cross-domain, socio-technical \n\nGenesis and objectives of EarthCube \nIn 2011 the U.S. National Science Foundation initiated EarthCube, a joint effort of NSF\u2019s Office of Cyberinfrastructure (OCI), whose interest was in computational and data-rich science and engineering, and the Geosciences Directorate (GEO), whose interest was in understanding and forecasting the behavior of a complex and evolving Earth system. The goal in creating EarthCube was to create a sustainable, community-based and open cyberinfrastructure for all researchers and educators across the geosciences.\nThe NSF recognized there was no infrastructure that could manage and provide access to all geosciences data in an open, transparent and inclusive manner, and that progress in geosciences would be increasingly reliant on interdisciplinary activities. Therefore, a system that enabled the sharing, interoperability and re-use of data needed to be created.\nSimilar efforts to provide the infrastructure needed to support scientific research and innovation is underway in other countries, most notably in the European Union, guided by the European Strategy Forum on Research Infrastructures (ESFRI) and in Australia under the National Collaborative Research Infrastructure Strategy (NCRIS). The goal of these efforts is to provide scientists, policy makers and the public with computing resources, analytic tools and educational material, all within an open, interconnected and collaborative environment.\n\nThe nature of infrastructure development \nThe building of infrastructure is as much a social endeavor as a technical one. Bowker et al.[1] emphasized that information infrastructures are more than the data, tools and networks comprising the technical elements, but also involve the people, practices, and institutions that lead to the creation, adoption and evolution of the underlying technology. The NSF realized that a cyberinfrastructure, to be successful, must have substantial involvement of the target community through all phases of its development, from inception to deployment. In fact, studies have shown infrastructure evolves from independent and isolated efforts and there is not a clear point where \"deployment\" is complete.[2] The fundamental challenge was the heterogeneity of scientific disciplines and technologies that needed to cooperate to accomplish this goal, and the necessity of getting all stakeholders to cooperate in its development. A compounding factor is that while technology evolves rapidly, people\u2019s habits, work practices, cultural attitudes towards data sharing, and willingness to use others' data, all evolve more slowly. How the relationship of people to the infrastructure evolves determines whether it succeeds or fails.\nA significant element of NSF\u2019s strategy for building EarthCube was to make it a collective effort of geoscientists and technologists from the start, in hopes of ensuring that what was developed did indeed serve the needs of geoscientists and would in fact find widespread uptake. A series of community events and end-user workshops spanning the geoscience disciplines were undertaken with the dual goals of gathering requirements for EarthCube and building a community of geoscientists willing to engage with and take ownership of the EarthCube process.\nNSF began issuing small awards to explore concepts for EarthCube. These were followed by the funding of an initial set of \"building blocks\" meant to demonstrate potential components of EarthCube. The Brokering Building Block (BCube) was one of these awards. BCube sought both to solve real problems of interoperability that geoscientist face in carrying out research, while also studying the social aspects of technology adoption.\n\nThe challenge of cross-disciplinary interoperability \nInteroperability has many facets and can be viewed from either the perspective of systems or people. Systems are interoperable when they can exchange information without having to know the details of each other's internal workings. Likewise, people view systems or data as interoperable when they don\u2019t have to learn the intricacies of each in order to use them. When systems are interoperable, users of those systems should have uniform access and receive harmonized services and data from them. This is the vision of EarthCube. Delivering on that vision can be considered the \"grand challenge\" of information technology as applied to the geosciences.\nThe reason that achieving interoperability across the geosciences is so challenging is because the many scientific fields that comprise the geosciences all have their own methods, standards and conventions for managing and sharing data. The sophistication of the information technologies that have been adopted in each community, the degree of standardization on data exchange formats and vocabularies, the amount of centralization in data cataloguing, and the openness to sharing data all vary greatly.\nThe methods of achieving interoperability across distributed systems can be categorized as shown in Table 1.\n\n\n\n\n\n\n\nTable 1. Methods for achieving interoperability\n\n\nMethod\n\nRequirements\n\nBenefits\n\n\nAdherence to common standards\n\nUniformity in system configuration\n\nDe facto interoperability\n\n\nGateways and translators\n\nInstallation and maintenance of custom or third-party software\n\nCan adapt to new or changing protocols and standards\n\n\nBrokers as infrastructure, third-party mediation\n\nCreation and maintenance of brokering framework with custom adapters\n\nProvides two-way translations between disparate systems and removes burdens of interoperability from data provider\n\n\n\nSince disciplines will always use different standards for encoding, accessing and describing data, the first option is not a realistic one for the geosciences. The second method is currently in wide use within the geosciences, such as GBIF[3], which harvests metadata from multiple external systems and then maps the metadata \u2014 which are served through different protocols and use different schema \u2014 to a common standard. Systems such as ERDAAP[4][5] act as servers accessing disparate datasets and serving them through a common interface. What BCube explored was the possibility that a broker, mediating the interactions between many systems serving data and many systems requesting data, could be established as a shared service, i.e., as infrastructure, without being tied to any particular repository or user portal.\nEdwards et al.[6] show that technical infrastructures such as electrical grids and railroads evolve in stages, and the final stage is \"a process of consolidation characterized by gateways that allow dissimilar systems to be linked into networks.\" Brokering is such a gateway, applied in the context of information systems. While brokering technologies such as CORBA[a] have been in existence since the 1990s, their application typically requires participants in a network to install software packages that enable interfacing through a common protocol. Conformance to uniform standards is clearly a barrier in cross-disciplinary contexts since each community tends to develop its own conventions for storing, describing and accessing data.\n\nThe BCube brokering framework \nThe BCube project advanced a brokering framework by addressing the social, technical and organizational aspects of cyberinfrastructure development. It sought to identify best practices in both technical and cultural contexts by means of engaging scientist with the evolving cyberinfrastructure to achieve effective cross-disciplinary collaborations. The engagement included a number of different communities in guiding and testing the development, with the aim of involving geoscientists at a deep level in the entire process.\nBCube adapted a brokering framework that had been developed for the EuroGEOSS project[7] and subsequently deployed in the Global Earth Observation System of Systems (GEOSS). Called the Discovery and Access Broker, or DAB[8], it has successfully brokered millions of data records from dozens of data sources. Guided by the recommendations laid out in the Brokering Roadmap[9], BCube sought to demonstrate how brokering could enhance cross-disciplinary data discovery and access by having scientists from different fields create real-world science scenarios that required the use of data from diverse sources.\nThe approach that BCube promoted was one in which the broker was taught to interact with each community\u2019s conventions, allowing the participating systems to interact without adopting a common set of standards. BCube developers then set about configuring a cloud-based version of the DAB to access these sources. This required developing software components, called \"accessors,\" that interacted with each data source. At the start of the project we believed the suite of accessors that had already been developed for GEOSS could in many cases be reused for brokering the datasets identified in BCube's science scenarios.\nThe brokering framework is depicted in Figure 1.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. The BCube Broker, based on GI-cat and related software from CNR[b], mediates two-way requests and responses between clients, depicted on the left and data repositories, depicted on the right, for data query, access, and transform services\n\n\n\nScience scenarios \nThe project was guided by science scenarios developed by the geoscientists on the BCube team. These scenarios were used to define requirements for the broker development while engaging the geoscience community with EarthCube. They also provided the basis for evaluating the added value of brokering.\nThe term \"science scenario\" was used in place of what is more commonly known in software development as \"use case.\" This was in response to a concern in that EarthCube should be solving real, rather than hypothetical problems.\nThe science scenarios, coming from the fields of hydrology, oceanography, polar science and climate\/weather, focused on the specific research needs of each scientist. For each scenario, a team composed of domain scientists and computer scientists was convened to investigate the ability of the BCube Brokering Framework to meet the identified needs of the scientists. These needs determined what new or modified mediation functions the broker needed to perform in order to fulfill the scenario.\nSeveral different types of scenarios were defined. There were scenarios that described high-level science research or education goals without referencing specific data and services. The enactment of these scenarios involved both discovery and access as part of the scenario. The primary type of BCube scenario was the detailed science or education scenario in which the scientist identified specific data sources and services that they wished to have access to. Each scenario described the end-to-end activities required to achieve a science objective. By observing how the objective was accomplished first without brokering and then with brokering we were able to evaluate how the broker was saving time and effort. The flow for this type of scenario is depicted in Figure 2.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 2. Flow in development and enactment of BCube science scenarios\n\n\n\nThe third type of scenario the project defined involved configuring the broker to access the resources of a major data repository, thereby making its resources discoverable and accessible, thereby supporting cross-discipline research.\nThe BCube Brokering Framework gives access to 17 different data repositories serving over five million datasets, as show in Table 2.\n\n\n\n\n\n\n\nTable 2. Resources brokered by the BCube Brokering Framework, along with the access protocol and number of records for each\n\n\nRepository\/source\n\nProtocol\n\nNumber of datasets\n\n\nAVHRR SST\n\nTHREDDS\n\n62,777\n\n\nBCO DMO\n\nSPARQL\n\n10,702\n\n\nGlobal Multi-Resolution Topography (GMRT)\n\nOGC WMS\n\n13\n\n\nIRIS Event\n\nCustom\n\n4,213,828\n\n\nIRIS Station\n\nCustom\n\n544,991\n\n\nIntegrated Marine Observing systems\n\nOGC CSW\n\n601\n\n\nNASA ASTER\n\nOPeNDAP\n\n22,684\n\n\nNERRS\n\nSOAP\n\n329\n\n\nNSIDC\n\nOpenSearch\n\n161\n\n\nOne Geology\n\nOGC CSW\n\n438\n\n\nPANGAEA\n\nOAI-PMH\n\n356,943\n\n\nRTOF Models\n\nGrADS\n\n46\n\n\nRutgers ERDDAP service\n\nOPeNDAP\n\n1,200\n\n\nSRTM NASA\n\nOPeNDAP\n\n14,282\n\n\nUNAVCO GPS\n\nCustom\n\n1,739\n\n\nUNAVCO SSARA\n\nSOAP\n\n2,000\n\n\nUS NODC\n\nOGC CSW\n\n29,840\n\n\n\nMetadata brokering \nPeople and autonomous agents find resources, by which we mean data, models, computational services and the like, through the encoded information describing those resources, i.e., metadata. Metadata should also describe how resources are structured and accessed. In brokering a resource, the broker must first access and translate the available metadata and map it to a common internal data model. To serve a metadata record in response to a request the broker maps from the internal model to the model conforming to the protocol consistent with the request.\nUnlike the unstructured metadata that supports free text queries used with general search engines, metadata that has been mapped to a common data model enables search by specific features of the data such as its temporal coverage and spatial extent.\nThe BCube Brokering Framework was already equipped to understand many common protocols and metadata standards, such as OAI-PMH and OGC's CSW. For some of the services required by the science scenarios, however, a one-time manual mapping was required when the metadata model of resource was not already known to broker. The internal data model of the broker is based on the ISO 19115 family of metadata standards, but is extensible to accommodate unique community requirements. A basic form of semantic mediation was possible with the BCube broker through augmentation of terms in keyword searches.[10]\n\nData brokering \nIf a resource used an access protocol and data format that was already known to the broker, it should be simply a matter of pointing the broker to the service endpoint. However, it is common to encounter a service endpoint that does not completely conform to the declared protocols and standards, necessitating customization of the accessor. Customization was always required for protocols or encodings for which there was not an existing accessor. One of the main activities and resource drains within the BCube project was the development and testing of accessors. Midway through the project, an Accessor Development Kit (ADK) was released to help the developers who had been tasked with writing accessors.\n\nSustainability \nThe large-scale physical infrastructures (water, power, communication networks) that societies depend on are seen as the responsibility of government and commerce. The internet, which began as a research infrastructure supported by communications protocols, has evolved into a vast, unstructured information resource, enabled through the standards that underlie the World Wide Web. EarthCube, which must build on existing cyberinfrastructure technologies, itself must find a means to become self-sustaining. While it is expected that most of the individual elements developed with EarthCube funding eventually find a means of self-support, it has been the belief within the BCube project that certain foundational elements would need continued funding. In 2001, NSF foresaw the need for such foundational infrastructure and established the NSF Middleware Initiative to define, develop and support an integrated national middleware infrastructure. The focus then was on grid and high-performance computing, and on identify and access management tools.[11]\nTeam members from BCube initiated a working group within the Research Data Alliance, to explore solutions for the governance and sustainability of middleware. In their report[12] Sustainable Business Models for Brokering Middleware to Support Research Interoperability the Working Group concluded that the strongest model for sustainability would be one where a federally funded data facility provided guardianship at the stage where the broker was being established followed by a consortium model and\/or software-as-a-service model as the broker matured. This approach was anticipated by Ribes and Finholt[13], who predicted that in the face of short-term funding, cyberinfrastructure projects will attempt to transition to facilities by forming alliances with the persistent institutions of science in their domain fields.\n\nLessons learned \nFrom the start we realized that BCube was not primarily about software development. It was about demonstrating an approach to the construction of EarthCube that would achieve the maximum buy-in from the geosciences community. We aimed to do this by making it easier for geoscientists to find, use and share data and knowledge in an interdisciplinary context without requiring the providers and consumers of that data and knowledge to do extra work. The technical aspects of this were straightforward: write code that mediates the interactions between a distributed and diverse set of clients and servers. The software that was developed, however, had to fit into the greater context of EarthCube. Also, the resources needed to create and maintain this software had to be weighed against other investments necessary for a viable cyberinfrastructure. Furthermore, many researchers felt that the investments being made in technology projects were siphoning off money that should be going to basic research. Until a technology makes it substantially easier to do their work, or opens up opportunities to make new discoveries, scientists will be reluctant to support the EarthCube enterprise.\nFactors influencing attitudes towards brokering can also be viewed from the perspective of data providers who wish to fulfill the expectation or obligation of making their data available to users outside their normal clientele. They would support a brokering service only if they had confidence that the service would have long-term support and be able to adapt to any changes over time in the provider\u2019s data and mission, as well as support the evolving demands of both its customary as well as external, cross-disciplinary users.\nThese tensions were clearly called out in a report by an EarthCube Advisory Committee.[14] The Committee felt that EarthCube lacked clear definition and had yet to deliver on its promises. They saw a need for a succinct implementation plan. Funded projects, most of which have yet to move beyond pilot demonstrations, responded that infrastructure development is a long process and that patience is required.[15] The need for patience is well articulated by Ribes and Finholt[13], who argue that infrastructure development is an occasion for the long now \u2013 the collapsing of the demands of immediate design and deployment with the work of maintenance and sustainable development.\nOne of the key lessons that we learned in BCube was that timely delivery of features is vital to keeping scientists engaged during the development phase. Because there were only a small number of software engineers who had access to the BCube source code, and they were doing all the coding and testing of accessors, it was often that a month or longer between the time a scientist who was working with the broker found problems and the time those problems were fixed. Research programs are hard to maintain under such interruptions.\nThe creation of the Accessor Development Kit (ADK) was meant to mitigate these problems, but the kit itself was new and needed some refinements after initial use, leading to further delays in getting necessary functionality into the broker. Furthermore, developers are hindered in writing robust code to an interface when they have incomplete knowledge of what happens on the other side of the interface, especially when the accessor they are building is invoking operations on data that will be executed by code that they have no access to.\nA principal take-home message from the experience of brokering in the BCube project is that in order to achieve the level of community participation in the development and use of software that is intended to become infrastructure, the entire code base should be accessible to developers. Also, since the data sources that are of most interest to geoscientists are often in an evolving, dynamic state, it is challenging to build mediators for them, which argues all the more strongly for open-source code that would that would give the systems developed the responsiveness and flexibility to remain relevant to scientists\u2019 needs and interests.\n\nConclusions \nThe BCube project successfully demonstrated that is was possible to build a brokering framework that mediated the interactions between clients and servers, where clients could be individuals using a web portal, desktop application software, clearinghouses or other service consumers, and servers were data catalogues, data repositories, and data services. Mediation allowed these clients and servers to each use their own distinct protocols, semantics, and data syntaxes in managing their data yet still be part of a larger interoperable system, all without needing to install new software or change the way they carried out their operations and workflows. However, the degree of engagement with the science community that BCube sought fell far short of what was hoped for. Delays in delivering functionality were largely responsible for this. In some cases this was compounded by the small amount of time that scientists who signed on to the project had committed. It became clear that an independent interoperability solution based on middleware was viable only if communities become involved in supporting software development and maintenance.\nIt can be said that new data services can be considered infrastructure only after the users of the technology adapt their behaviors to these new capabilities. EarthCube has yet to deliver the capabilities that would lead to widespread changes in the way geoscientists do their work, but this, indeed, takes time.\n\nFootnotes \n\n\n\u2191 The Common Object Request Broker Architecture (CORBA), a standard defined by the Object Management Group (OMG), is designed to facilitate the communication of systems that are deployed on diverse platforms. \n\n\u2191 CNR, Consiglio Nazionale delle Ricerche (National Research Council of Italy), Institute of Atmospheric Pollution Research http:\/\/www.iia.cnr.it\/ \n\n\nAcknowledgements \nThe BCube team included dozens of people from several different institutions. The author gratefully acknowledges their contributions to achievements of the project. The team composition can be viewed at: http:\/\/nsidc.org\/informatics\/bcube\/communities.\n\nCompeting interests \nThe author has no competing interests to declare.\n\nReferences \n\n\n\u2191 Bowker, G.C.; Baker, K.; Millerand, F.; Ribes, D. (2010). \"Toward Information Infrastructure Studies: Ways of Knowing in a Networked Environment\". In Hunsinger, J.; Klastrup, L.; Allen, M.. International Handbook of Internet Research. Springer Netherlands. pp. 97\u2013117. ISBN 9781402097898.   \n\n\u2191 Star, S.L.; Ruhleder, K. (1996). \"Steps toward an ecology of infrastructure: Design and access for large information spaces\". Information Systems Research 7 (1): 111\u2013134. ISBN 10.1287\/isre.7.1.111.   \n\n\u2191 Edwards, J.L.; Lane, M.A.; Nielsen, E.S. (2000). \"Interoperability of biodiversity databases: Biodiversity information on every desktop\". Science 289 (5488): 2312-2314. ISBN 10.1126\/science.289.5488.2312. PMID 11009409.   \n\n\u2191 Simons, R.A.; Mendelssohn, R. (2012). \"ERDDAP - A Brokering Data Server for Gridded and Tabular Datasets\". American Geophysical Union, Fall Meeting 2012 2012: IN21B-1473. http:\/\/adsabs.harvard.edu\/abs\/2012AGUFMIN21B1473S .   \n\n\u2191 Delaney, C.; Alessandrini, A.; Greidanus, H. (2016). \"Using message brokering and data mediation on earth science data to enhance global maritime situational awareness\". IOP Conference Series: Earth and Environmental Science 34: 012005. doi:10.1088\/1755-1315\/34\/1\/012005.   \n\n\u2191 Edwards, P.N.; Jackson, S.J.; Bowker, G.C. et al. (2012). \"Understanding infrastructure: Dynamics, tensions, and design\". Deep Blue. http:\/\/hdl.handle.net\/2027.42\/49353 .   \n\n\u2191 Vaccari, L.; Craglia, M.; Fugazza, C. et al. (2012). \"Integrative Research: The EuroGEOSS Experience\". IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5 (6): 1603\u20131611. doi:10.1109\/JSTARS.2012.2190382.   \n\n\u2191 Nativi, S.; Craglia, M.; Pearlman, J. (2013). \"Earth Science Infrastructures Interoperability: The Brokering Approach\". IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 6 (3): 1118\u20131129. doi:10.1109\/JSTARS.2013.2243113.   \n\n\u2191 Khalsa, S.J.; Pearlman, J.; Nativi, S. et al. (2013). \"Brokering for EarthCube Communities: A Road Map\" (PDF). National Snow and Ice Data Center. doi:10.7265\/N59C6VBC. https:\/\/www.earthcube.org\/sites\/default\/files\/doc-repository\/EarthCube%2520Brokering%2520Roadmap.pdf .   \n\n\u2191 Santoro, M.; Mazzetti, P.; Nativi, S. et al. (2012). \"Chapter 20: Methodologies for Augmented Discovery of Geospatial Resources\". In Information Resources Management Association. Geographic Information Systems: Concepts, Methodologies, Tools, and Applications. IGI Global. pp. 305\u2013335. doi:10.4018\/978-1-4666-2038-4.ch020. ISBN 9781466620384.   \n\n\u2191 Sun, X.-H.; Blatecky, A.R. (2004). \"Middleware: The key to next generation computing\". Journal of Parallel and Distributed Computing 64 (6): 689\u2013691. doi:10.1016\/j.jpdc.2004.03.002.   \n\n\u2191 Nativi, S.; Craglia, M.; Pearlman, J. et al. (07 December 2015). \"Sustainable Business Models for Brokering Middleware to Support Research Interoperability\". Research Data Alliance. https:\/\/www.rd-alliance.org\/group\/brokering-ig-brokering-governance-wg\/outcomes\/sustainable-business-models-brokering-middleware . Retrieved 30 October 2016 .   \n\n\u2191 13.0 13.1 Ribes, D.; Finholt, T.A. (2009). \"The long now of technology infrastructure: Articulating tensions in development\". Journal of the Association for Information Systems 10 (5): 5. http:\/\/aisel.aisnet.org\/jais\/vol10\/iss5\/5 .   \n\n\u2191 \"EarthCube Advisory Committee Report\" (PDF). EarthCube. 11 March 2016. https:\/\/www.earthcube.org\/sites\/default\/files\/doc-repository\/earthcube_rsv_report_final_16_03_21.pdf . Retrieved 30 October 2016 .   \n\n\u2191 Witze, A. (2016). \"Effort to wrangle geoscience data faces uncertain future\". Nature 538 (7625): 303. doi:10.1038\/538303a. PMID 27762384.   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version \u2014 by design \u2014 lists them in order of appearance. Footnotes have been changed from numbers to letters as citations are currently using numbers.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\">https:\/\/www.limswiki.org\/index.php\/Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2017)LIMSwiki journal articles (all)LIMSwiki journal articles on big dataLIMSwiki journal articles on informatics\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 28 December 2017, at 19:16.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 2,757 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","e80be5db806b508a1aecd418f32667db_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Data_and_metadata_brokering_\u2013_Theory_and_practice_from_the_BCube_Project skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Data and metadata brokering \u2013 Theory and practice from the BCube Project<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p>EarthCube is a U.S. National Science Foundation initiative that aims to create a cyberinfrastructure (CI) for all the geosciences. An initial set of \"building blocks\" was funded to develop potential components of that CI. The Brokering Building Block (BCube) created a brokering framework to demonstrate cross-disciplinary data access based on a set of use cases developed by scientists from the domains of hydrology, oceanography, polar science and climate\/weather. While some successes were achieved, considerable challenges were encountered. We present a synopsis of the processes and outcomes of the BCube experiment.\n<\/p><p><b>Keywords<\/b>: interoperability, brokering, middleware, EarthCube, cross-domain, socio-technical \n<\/p>\n<h2><span class=\"mw-headline\" id=\"Genesis_and_objectives_of_EarthCube\">Genesis and objectives of EarthCube<\/span><\/h2>\n<p>In 2011 the U.S. National Science Foundation initiated EarthCube, a joint effort of NSF\u2019s Office of Cyberinfrastructure (OCI), whose interest was in computational and data-rich science and engineering, and the Geosciences Directorate (GEO), whose interest was in understanding and forecasting the behavior of a complex and evolving Earth system. The goal in creating EarthCube was to create a sustainable, community-based and open cyberinfrastructure for all researchers and educators across the geosciences.\n<\/p><p>The NSF recognized there was no infrastructure that could manage and provide access to all geosciences data in an open, transparent and inclusive manner, and that progress in geosciences would be increasingly reliant on interdisciplinary activities. Therefore, a system that enabled the sharing, interoperability and re-use of data needed to be created.\n<\/p><p>Similar efforts to provide the infrastructure needed to support scientific research and innovation is underway in other countries, most notably in the European Union, guided by the European Strategy Forum on Research Infrastructures (ESFRI) and in Australia under the National Collaborative Research Infrastructure Strategy (NCRIS). The goal of these efforts is to provide scientists, policy makers and the public with computing resources, analytic tools and educational material, all within an open, interconnected and collaborative environment.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"The_nature_of_infrastructure_development\">The nature of infrastructure development<\/span><\/h2>\n<p>The building of infrastructure is as much a social endeavor as a technical one. Bowker <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-BowkerToward10_1-0\" class=\"reference\"><a href=\"#cite_note-BowkerToward10-1\" rel=\"external_link\">[1]<\/a><\/sup> emphasized that <a href=\"https:\/\/www.limswiki.org\/index.php\/Information\" title=\"Information\" target=\"_blank\" class=\"wiki-link\" data-key=\"6300a14d9c2776dcca0999b5ed940e7d\">information<\/a> infrastructures are more than the data, tools and networks comprising the technical elements, but also involve the people, practices, and institutions that lead to the creation, adoption and evolution of the underlying technology. The NSF realized that a cyberinfrastructure, to be successful, must have substantial involvement of the target community through all phases of its development, from inception to deployment. In fact, studies have shown infrastructure evolves from independent and isolated efforts and there is not a clear point where \"deployment\" is complete.<sup id=\"rdp-ebb-cite_ref-StarSteps96_2-0\" class=\"reference\"><a href=\"#cite_note-StarSteps96-2\" rel=\"external_link\">[2]<\/a><\/sup> The fundamental challenge was the heterogeneity of scientific disciplines and technologies that needed to cooperate to accomplish this goal, and the necessity of getting all stakeholders to cooperate in its development. A compounding factor is that while technology evolves rapidly, people\u2019s habits, work practices, cultural attitudes towards data sharing, and willingness to use others' data, all evolve more slowly. How the relationship of people to the infrastructure evolves determines whether it succeeds or fails.\n<\/p><p>A significant element of NSF\u2019s strategy for building EarthCube was to make it a collective effort of geoscientists and technologists from the start, in hopes of ensuring that what was developed did indeed serve the needs of geoscientists and would in fact find widespread uptake. A series of community events and end-user workshops spanning the geoscience disciplines were undertaken with the dual goals of gathering requirements for EarthCube and building a community of geoscientists willing to engage with and take ownership of the EarthCube process.\n<\/p><p>NSF began issuing small awards to explore concepts for EarthCube. These were followed by the funding of an initial set of \"building blocks\" meant to demonstrate potential components of EarthCube. The Brokering Building Block (BCube) was one of these awards. BCube sought both to solve real problems of interoperability that geoscientist face in carrying out research, while also studying the social aspects of technology adoption.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"The_challenge_of_cross-disciplinary_interoperability\">The challenge of cross-disciplinary interoperability<\/span><\/h2>\n<p>Interoperability has many facets and can be viewed from either the perspective of systems or people. Systems are interoperable when they can exchange information without having to know the details of each other's internal workings. Likewise, people view systems or data as interoperable when they don\u2019t have to learn the intricacies of each in order to use them. When systems are interoperable, users of those systems should have uniform access and receive harmonized services and data from them. This is the vision of EarthCube. Delivering on that vision can be considered the \"grand challenge\" of information technology as applied to the geosciences.\n<\/p><p>The reason that achieving interoperability across the geosciences is so challenging is because the many scientific fields that comprise the geosciences all have their own methods, standards and conventions for managing and sharing data. The sophistication of the information technologies that have been adopted in each community, the degree of standardization on data exchange formats and vocabularies, the amount of centralization in data cataloguing, and the openness to sharing data all vary greatly.\n<\/p><p>The methods of achieving interoperability across distributed systems can be categorized as shown in Table 1.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"3\"><b>Table 1.<\/b> Methods for achieving interoperability\n<\/td><\/tr>\n<tr>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Method\n<\/th>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Requirements\n<\/th>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Benefits\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Adherence to common standards\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Uniformity in system configuration\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">De facto interoperability\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Gateways and translators\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Installation and maintenance of custom or third-party software\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Can adapt to new or changing protocols and standards\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Brokers as infrastructure, third-party mediation\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Creation and maintenance of brokering framework with custom adapters\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Provides two-way translations between disparate systems and removes burdens of interoperability from data provider\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>Since disciplines will always use different standards for encoding, accessing and describing data, the first option is not a realistic one for the geosciences. The second method is currently in wide use within the geosciences, such as GBIF<sup id=\"rdp-ebb-cite_ref-EdwardsInter00_3-0\" class=\"reference\"><a href=\"#cite_note-EdwardsInter00-3\" rel=\"external_link\">[3]<\/a><\/sup>, which harvests metadata from multiple external systems and then maps the metadata \u2014 which are served through different protocols and use different schema \u2014 to a common standard. Systems such as ERDAAP<sup id=\"rdp-ebb-cite_ref-SimonsERDDAP12_4-0\" class=\"reference\"><a href=\"#cite_note-SimonsERDDAP12-4\" rel=\"external_link\">[4]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-DelaneyUsing16_5-0\" class=\"reference\"><a href=\"#cite_note-DelaneyUsing16-5\" rel=\"external_link\">[5]<\/a><\/sup> act as servers accessing disparate datasets and serving them through a common interface. What BCube explored was the possibility that a broker, mediating the interactions between many systems serving data and many systems requesting data, could be established as a shared service, i.e., as infrastructure, without being tied to any particular repository or user portal.\n<\/p><p>Edwards <i>et al.<\/i><sup id=\"rdp-ebb-cite_ref-EdwardsUnder07_6-0\" class=\"reference\"><a href=\"#cite_note-EdwardsUnder07-6\" rel=\"external_link\">[6]<\/a><\/sup> show that technical infrastructures such as electrical grids and railroads evolve in stages, and the final stage is \"a process of consolidation characterized by gateways that allow dissimilar systems to be linked into networks.\" Brokering is such a gateway, applied in the context of information systems. While brokering technologies such as CORBA<sup id=\"rdp-ebb-cite_ref-7\" class=\"reference\"><a href=\"#cite_note-7\" rel=\"external_link\">[a]<\/a><\/sup> have been in existence since the 1990s, their application typically requires participants in a network to install software packages that enable interfacing through a common protocol. Conformance to uniform standards is clearly a barrier in cross-disciplinary contexts since each community tends to develop its own conventions for storing, describing and accessing data.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"The_BCube_brokering_framework\">The BCube brokering framework<\/span><\/h2>\n<p>The BCube project advanced a brokering framework by addressing the social, technical and organizational aspects of cyberinfrastructure development. It sought to identify best practices in both technical and cultural contexts by means of engaging scientist with the evolving cyberinfrastructure to achieve effective cross-disciplinary collaborations. The engagement included a number of different communities in guiding and testing the development, with the aim of involving geoscientists at a deep level in the entire process.\n<\/p><p>BCube adapted a brokering framework that had been developed for the EuroGEOSS project<sup id=\"rdp-ebb-cite_ref-VaccariInteg12_8-0\" class=\"reference\"><a href=\"#cite_note-VaccariInteg12-8\" rel=\"external_link\">[7]<\/a><\/sup> and subsequently deployed in the Global Earth Observation System of Systems (GEOSS). Called the Discovery and Access Broker, or DAB<sup id=\"rdp-ebb-cite_ref-NativiEarth13_9-0\" class=\"reference\"><a href=\"#cite_note-NativiEarth13-9\" rel=\"external_link\">[8]<\/a><\/sup>, it has successfully brokered millions of data records from dozens of data sources. Guided by the recommendations laid out in the Brokering Roadmap<sup id=\"rdp-ebb-cite_ref-KhalsaBrokering13_10-0\" class=\"reference\"><a href=\"#cite_note-KhalsaBrokering13-10\" rel=\"external_link\">[9]<\/a><\/sup>, BCube sought to demonstrate how brokering could enhance cross-disciplinary data discovery and access by having scientists from different fields create real-world science scenarios that required the use of data from diverse sources.\n<\/p><p>The approach that BCube promoted was one in which the broker was taught to interact with each community\u2019s conventions, allowing the participating systems to interact without adopting a common set of standards. BCube developers then set about configuring a cloud-based version of the DAB to access these sources. This required developing software components, called \"accessors,\" that interacted with each data source. At the start of the project we believed the suite of accessors that had already been developed for GEOSS could in many cases be reused for brokering the datasets identified in BCube's science scenarios.\n<\/p><p>The brokering framework is depicted in Figure 1.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Khalsa_DataScienceJ2017_16-1.png\" class=\"image wiki-link\" target=\"_blank\" data-key=\"8e3696811270a0195dc712badd6415ca\"><img alt=\"Fig1 Khalsa DataScienceJ2017 16-1.png\" src=\"https:\/\/www.limswiki.org\/images\/1\/1d\/Fig1_Khalsa_DataScienceJ2017_16-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> The BCube Broker, based on GI-cat and related software from CNR<sup id=\"rdp-ebb-cite_ref-11\" class=\"reference\"><a href=\"#cite_note-11\" rel=\"external_link\">[b]<\/a><\/sup>, mediates two-way requests and responses between clients, depicted on the left and data repositories, depicted on the right, for data query, access, and transform services<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Science_scenarios\">Science scenarios<\/span><\/h2>\n<p>The project was guided by science scenarios developed by the geoscientists on the BCube team. These scenarios were used to define requirements for the broker development while engaging the geoscience community with EarthCube. They also provided the basis for evaluating the added value of brokering.\n<\/p><p>The term \"science scenario\" was used in place of what is more commonly known in software development as \"use case.\" This was in response to a concern in that EarthCube should be solving real, rather than hypothetical problems.\n<\/p><p>The science scenarios, coming from the fields of hydrology, oceanography, polar science and climate\/weather, focused on the specific research needs of each scientist. For each scenario, a team composed of domain scientists and computer scientists was convened to investigate the ability of the BCube Brokering Framework to meet the identified needs of the scientists. These needs determined what new or modified mediation functions the broker needed to perform in order to fulfill the scenario.\n<\/p><p>Several different types of scenarios were defined. There were scenarios that described high-level science research or education goals without referencing specific data and services. The enactment of these scenarios involved both discovery and access as part of the scenario. The primary type of BCube scenario was the detailed science or education scenario in which the scientist identified specific data sources and services that they wished to have access to. Each scenario described the end-to-end activities required to achieve a science objective. By observing how the objective was accomplished first without brokering and then with brokering we were able to evaluate how the broker was saving time and effort. The flow for this type of scenario is depicted in Figure 2.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Khalsa_DataScienceJ2017_16-1.png\" class=\"image wiki-link\" target=\"_blank\" data-key=\"f190a3ab8fd30699d7f8f6ec994e6700\"><img alt=\"Fig2 Khalsa DataScienceJ2017 16-1.png\" src=\"https:\/\/www.limswiki.org\/images\/6\/64\/Fig2_Khalsa_DataScienceJ2017_16-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> Flow in development and enactment of BCube science scenarios<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>The third type of scenario the project defined involved configuring the broker to access the resources of a major data repository, thereby making its resources discoverable and accessible, thereby supporting cross-discipline research.\n<\/p><p>The BCube Brokering Framework gives access to 17 different data repositories serving over five million datasets, as show in Table 2.\n<\/p>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table class=\"wikitable\" border=\"1\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\" colspan=\"3\"><b>Table 2.<\/b> Resources brokered by the BCube Brokering Framework, along with the access protocol and number of records for each\n<\/td><\/tr>\n<tr>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Repository\/source\n<\/th>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Protocol\n<\/th>\n<th style=\"background-color:#dddddd; padding-left:10px; padding-right:10px;\">Number of datasets\n<\/th><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">AVHRR SST\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">THREDDS\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">62,777\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">BCO DMO\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">SPARQL\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">10,702\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Global Multi-Resolution Topography (GMRT)\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">OGC WMS\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">13\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">IRIS Event\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Custom\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">4,213,828\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">IRIS Station\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Custom\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">544,991\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Integrated Marine Observing systems\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">OGC CSW\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">601\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">NASA ASTER\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">OPeNDAP\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">22,684\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">NERRS\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">SOAP\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">329\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">NSIDC\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">OpenSearch\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">161\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">One Geology\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">OGC CSW\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">438\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">PANGAEA\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">OAI-PMH\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">356,943\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">RTOF Models\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">GrADS\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">46\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Rutgers ERDDAP service\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">OPeNDAP\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1,200\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">SRTM NASA\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">OPeNDAP\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">14,282\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">UNAVCO GPS\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">Custom\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">1,739\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">UNAVCO SSARA\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">SOAP\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">2,000\n<\/td><\/tr>\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">US NODC\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">OGC CSW\n<\/td>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\">29,840\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Metadata_brokering\">Metadata brokering<\/span><\/h2>\n<p>People and autonomous agents find resources, by which we mean data, models, computational services and the like, through the encoded information describing those resources, i.e., metadata. Metadata should also describe how resources are structured and accessed. In brokering a resource, the broker must first access and translate the available metadata and map it to a common internal data model. To serve a metadata record in response to a request the broker maps from the internal model to the model conforming to the protocol consistent with the request.\n<\/p><p>Unlike the unstructured metadata that supports free text queries used with general search engines, metadata that has been mapped to a common data model enables search by specific features of the data such as its temporal coverage and spatial extent.\n<\/p><p>The BCube Brokering Framework was already equipped to understand many common protocols and metadata standards, such as OAI-PMH and OGC's CSW. For some of the services required by the science scenarios, however, a one-time manual mapping was required when the metadata model of resource was not already known to broker. The internal data model of the broker is based on the ISO 19115 family of metadata standards, but is extensible to accommodate unique community requirements. A basic form of semantic mediation was possible with the BCube broker through augmentation of terms in keyword searches.<sup id=\"rdp-ebb-cite_ref-SantoroMethod12_12-0\" class=\"reference\"><a href=\"#cite_note-SantoroMethod12-12\" rel=\"external_link\">[10]<\/a><\/sup>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Data_brokering\">Data brokering<\/span><\/h2>\n<p>If a resource used an access protocol and data format that was already known to the broker, it should be simply a matter of pointing the broker to the service endpoint. However, it is common to encounter a service endpoint that does not completely conform to the declared protocols and standards, necessitating customization of the accessor. Customization was always required for protocols or encodings for which there was not an existing accessor. One of the main activities and resource drains within the BCube project was the development and testing of accessors. Midway through the project, an Accessor Development Kit (ADK) was released to help the developers who had been tasked with writing accessors.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Sustainability\">Sustainability<\/span><\/h2>\n<p>The large-scale physical infrastructures (water, power, communication networks) that societies depend on are seen as the responsibility of government and commerce. The internet, which began as a research infrastructure supported by communications protocols, has evolved into a vast, unstructured information resource, enabled through the standards that underlie the World Wide Web. EarthCube, which must build on existing cyberinfrastructure technologies, itself must find a means to become self-sustaining. While it is expected that most of the individual elements developed with EarthCube funding eventually find a means of self-support, it has been the belief within the BCube project that certain foundational elements would need continued funding. In 2001, NSF foresaw the need for such foundational infrastructure and established the NSF Middleware Initiative to define, develop and support an integrated national middleware infrastructure. The focus then was on grid and high-performance computing, and on identify and access management tools.<sup id=\"rdp-ebb-cite_ref-SunMiddle04_13-0\" class=\"reference\"><a href=\"#cite_note-SunMiddle04-13\" rel=\"external_link\">[11]<\/a><\/sup>\n<\/p><p>Team members from BCube initiated a working group within the Research Data Alliance, to explore solutions for the governance and sustainability of middleware. In their report<sup id=\"rdp-ebb-cite_ref-NativiSust15_14-0\" class=\"reference\"><a href=\"#cite_note-NativiSust15-14\" rel=\"external_link\">[12]<\/a><\/sup> <i>Sustainable Business Models for Brokering Middleware to Support Research Interoperability<\/i> the Working Group concluded that the strongest model for sustainability would be one where a federally funded data facility provided guardianship at the stage where the broker was being established followed by a consortium model and\/or <a href=\"https:\/\/www.limswiki.org\/index.php\/Software_as_a_service\" title=\"Software as a service\" target=\"_blank\" class=\"wiki-link\" data-key=\"ae8c8a7cd5ee1a264f4f0bbd4a4caedd\">software-as-a-service<\/a> model as the broker matured. This approach was anticipated by Ribes and Finholt<sup id=\"rdp-ebb-cite_ref-RibesTheLong09_15-0\" class=\"reference\"><a href=\"#cite_note-RibesTheLong09-15\" rel=\"external_link\">[13]<\/a><\/sup>, who predicted that in the face of short-term funding, cyberinfrastructure projects will attempt to transition to facilities by forming alliances with the persistent institutions of science in their domain fields.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Lessons_learned\">Lessons learned<\/span><\/h2>\n<p>From the start we realized that BCube was not primarily about software development. It was about demonstrating an approach to the construction of EarthCube that would achieve the maximum buy-in from the geosciences community. We aimed to do this by making it easier for geoscientists to find, use and share data and knowledge in an interdisciplinary context without requiring the providers and consumers of that data and knowledge to do extra work. The technical aspects of this were straightforward: write code that mediates the interactions between a distributed and diverse set of clients and servers. The software that was developed, however, had to fit into the greater context of EarthCube. Also, the resources needed to create and maintain this software had to be weighed against other investments necessary for a viable cyberinfrastructure. Furthermore, many researchers felt that the investments being made in technology projects were siphoning off money that should be going to basic research. Until a technology makes it substantially easier to do their work, or opens up opportunities to make new discoveries, scientists will be reluctant to support the EarthCube enterprise.\n<\/p><p>Factors influencing attitudes towards brokering can also be viewed from the perspective of data providers who wish to fulfill the expectation or obligation of making their data available to users outside their normal clientele. They would support a brokering service only if they had confidence that the service would have long-term support and be able to adapt to any changes over time in the provider\u2019s data and mission, as well as support the evolving demands of both its customary as well as external, cross-disciplinary users.\n<\/p><p>These tensions were clearly called out in a report by an EarthCube Advisory Committee.<sup id=\"rdp-ebb-cite_ref-ECAdvisory16_16-0\" class=\"reference\"><a href=\"#cite_note-ECAdvisory16-16\" rel=\"external_link\">[14]<\/a><\/sup> The Committee felt that EarthCube lacked clear definition and had yet to deliver on its promises. They saw a need for a succinct implementation plan. Funded projects, most of which have yet to move beyond pilot demonstrations, responded that infrastructure development is a long process and that patience is required.<sup id=\"rdp-ebb-cite_ref-WitzeEffort16_17-0\" class=\"reference\"><a href=\"#cite_note-WitzeEffort16-17\" rel=\"external_link\">[15]<\/a><\/sup> The need for patience is well articulated by Ribes and Finholt<sup id=\"rdp-ebb-cite_ref-RibesTheLong09_15-1\" class=\"reference\"><a href=\"#cite_note-RibesTheLong09-15\" rel=\"external_link\">[13]<\/a><\/sup>, who argue that infrastructure development is an occasion for the long now \u2013 the collapsing of the demands of immediate design and deployment with the work of maintenance and sustainable development.\n<\/p><p>One of the key lessons that we learned in BCube was that timely delivery of features is vital to keeping scientists engaged during the development phase. Because there were only a small number of software engineers who had access to the BCube source code, and they were doing all the coding and testing of accessors, it was often that a month or longer between the time a scientist who was working with the broker found problems and the time those problems were fixed. Research programs are hard to maintain under such interruptions.\n<\/p><p>The creation of the Accessor Development Kit (ADK) was meant to mitigate these problems, but the kit itself was new and needed some refinements after initial use, leading to further delays in getting necessary functionality into the broker. Furthermore, developers are hindered in writing robust code to an interface when they have incomplete knowledge of what happens on the other side of the interface, especially when the accessor they are building is invoking operations on data that will be executed by code that they have no access to.\n<\/p><p>A principal take-home message from the experience of brokering in the BCube project is that in order to achieve the level of community participation in the development and use of software that is intended to become infrastructure, the entire code base should be accessible to developers. Also, since the data sources that are of most interest to geoscientists are often in an evolving, dynamic state, it is challenging to build mediators for them, which argues all the more strongly for open-source code that would that would give the systems developed the responsiveness and flexibility to remain relevant to scientists\u2019 needs and interests.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Conclusions\">Conclusions<\/span><\/h2>\n<p>The BCube project successfully demonstrated that is was possible to build a brokering framework that mediated the interactions between clients and servers, where clients could be individuals using a web portal, desktop application software, clearinghouses or other service consumers, and servers were data catalogues, data repositories, and data services. Mediation allowed these clients and servers to each use their own distinct protocols, semantics, and data syntaxes in managing their data yet still be part of a larger interoperable system, all without needing to install new software or change the way they carried out their operations and workflows. However, the degree of engagement with the science community that BCube sought fell far short of what was hoped for. Delays in delivering functionality were largely responsible for this. In some cases this was compounded by the small amount of time that scientists who signed on to the project had committed. It became clear that an independent interoperability solution based on middleware was viable only if communities become involved in supporting software development and maintenance.\n<\/p><p>It can be said that new data services can be considered infrastructure only after the users of the technology adapt their behaviors to these new capabilities. EarthCube has yet to deliver the capabilities that would lead to widespread changes in the way geoscientists do their work, but this, indeed, takes time.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Footnotes\">Footnotes<\/span><\/h2>\n<div class=\"reflist\" style=\"list-style-type: lower-alpha;\">\n<ol class=\"references\">\n<li id=\"cite_note-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-7\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">The Common Object Request Broker Architecture (CORBA), a standard defined by the Object Management Group (OMG), is designed to facilitate the communication of systems that are deployed on diverse platforms.<\/span>\n<\/li>\n<li id=\"cite_note-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-11\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">CNR, Consiglio Nazionale delle Ricerche (National Research Council of Italy), Institute of Atmospheric Pollution Research <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.iia.cnr.it\/\" target=\"_blank\">http:\/\/www.iia.cnr.it\/<\/a><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h2>\n<p>The BCube team included dozens of people from several different institutions. The author gratefully acknowledges their contributions to achievements of the project. The team composition can be viewed at: <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/nsidc.org\/informatics\/bcube\/communities\" target=\"_blank\">http:\/\/nsidc.org\/informatics\/bcube\/communities<\/a>.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h2>\n<p>The author has no competing interests to declare.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-BowkerToward10-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BowkerToward10_1-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Bowker, G.C.; Baker, K.; Millerand, F.; Ribes, D. (2010). \"Toward Information Infrastructure Studies: Ways of Knowing in a Networked Environment\". In Hunsinger, J.; Klastrup, L.; Allen, M.. <i>International Handbook of Internet Research<\/i>. Springer Netherlands. pp. 97\u2013117. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9781402097898.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Toward+Information+Infrastructure+Studies%3A+Ways+of+Knowing+in+a+Networked+Environment&rft.atitle=International+Handbook+of+Internet+Research&rft.aulast=Bowker%2C+G.C.%3B+Baker%2C+K.%3B+Millerand%2C+F.%3B+Ribes%2C+D.&rft.au=Bowker%2C+G.C.%3B+Baker%2C+K.%3B+Millerand%2C+F.%3B+Ribes%2C+D.&rft.date=2010&rft.pages=pp.%26nbsp%3B97%E2%80%93117&rft.pub=Springer+Netherlands&rft.isbn=9781402097898&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-StarSteps96-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-StarSteps96_2-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Star, S.L.; Ruhleder, K. (1996). \"Steps toward an ecology of infrastructure: Design and access for large information spaces\". <i>Information Systems Research<\/i> <b>7<\/b> (1): 111\u2013134. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 10.1287\/isre.7.1.111.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Steps+toward+an+ecology+of+infrastructure%3A+Design+and+access+for+large+information+spaces&rft.jtitle=Information+Systems+Research&rft.aulast=Star%2C+S.L.%3B+Ruhleder%2C+K.&rft.au=Star%2C+S.L.%3B+Ruhleder%2C+K.&rft.date=1996&rft.volume=7&rft.issue=1&rft.pages=111%E2%80%93134&rft.isbn=10.1287%2Fisre.7.1.111&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-EdwardsInter00-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-EdwardsInter00_3-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Edwards, J.L.; Lane, M.A.; Nielsen, E.S. (2000). \"Interoperability of biodiversity databases: Biodiversity information on every desktop\". <i>Science<\/i> <b>289<\/b> (5488): 2312-2314. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 10.1126\/science.289.5488.2312. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/11009409\" target=\"_blank\">11009409<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Interoperability+of+biodiversity+databases%3A+Biodiversity+information+on+every+desktop&rft.jtitle=Science&rft.aulast=Edwards%2C+J.L.%3B+Lane%2C+M.A.%3B+Nielsen%2C+E.S.&rft.au=Edwards%2C+J.L.%3B+Lane%2C+M.A.%3B+Nielsen%2C+E.S.&rft.date=2000&rft.volume=289&rft.issue=5488&rft.pages=2312-2314&rft.isbn=10.1126%2Fscience.289.5488.2312&rft_id=info:pmid\/11009409&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SimonsERDDAP12-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SimonsERDDAP12_4-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Simons, R.A.; Mendelssohn, R. (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/adsabs.harvard.edu\/abs\/2012AGUFMIN21B1473S\" target=\"_blank\">\"ERDDAP - A Brokering Data Server for Gridded and Tabular Datasets\"<\/a>. <i>American Geophysical Union, Fall Meeting 2012<\/i> <b>2012<\/b>: IN21B-1473<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/adsabs.harvard.edu\/abs\/2012AGUFMIN21B1473S\" target=\"_blank\">http:\/\/adsabs.harvard.edu\/abs\/2012AGUFMIN21B1473S<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ERDDAP+-+A+Brokering+Data+Server+for+Gridded+and+Tabular+Datasets&rft.jtitle=American+Geophysical+Union%2C+Fall+Meeting+2012&rft.aulast=Simons%2C+R.A.%3B+Mendelssohn%2C+R.&rft.au=Simons%2C+R.A.%3B+Mendelssohn%2C+R.&rft.date=2012&rft.volume=2012&rft.pages=IN21B-1473&rft_id=http%3A%2F%2Fadsabs.harvard.edu%2Fabs%2F2012AGUFMIN21B1473S&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DelaneyUsing16-5\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DelaneyUsing16_5-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Delaney, C.; Alessandrini, A.; Greidanus, H. (2016). \"Using message brokering and data mediation on earth science data to enhance global maritime situational awareness\". <i>IOP Conference Series: Earth and Environmental Science<\/i> <b>34<\/b>: 012005. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1088%2F1755-1315%2F34%2F1%2F012005\" target=\"_blank\">10.1088\/1755-1315\/34\/1\/012005<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Using+message+brokering+and+data+mediation+on+earth+science+data+to+enhance+global+maritime+situational+awareness&rft.jtitle=IOP+Conference+Series%3A+Earth+and+Environmental+Science&rft.aulast=Delaney%2C+C.%3B+Alessandrini%2C+A.%3B+Greidanus%2C+H.&rft.au=Delaney%2C+C.%3B+Alessandrini%2C+A.%3B+Greidanus%2C+H.&rft.date=2016&rft.volume=34&rft.pages=012005&rft_id=info:doi\/10.1088%2F1755-1315%2F34%2F1%2F012005&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-EdwardsUnder07-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-EdwardsUnder07_6-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Edwards, P.N.; Jackson, S.J.; Bowker, G.C. et al. (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/hdl.handle.net\/2027.42\/49353\" target=\"_blank\">\"Understanding infrastructure: Dynamics, tensions, and design\"<\/a>. <i>Deep Blue<\/i><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/hdl.handle.net\/2027.42\/49353\" target=\"_blank\">http:\/\/hdl.handle.net\/2027.42\/49353<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Understanding+infrastructure%3A+Dynamics%2C+tensions%2C+and+design&rft.jtitle=Deep+Blue&rft.aulast=Edwards%2C+P.N.%3B+Jackson%2C+S.J.%3B+Bowker%2C+G.C.+et+al.&rft.au=Edwards%2C+P.N.%3B+Jackson%2C+S.J.%3B+Bowker%2C+G.C.+et+al.&rft.date=2012&rft_id=http%3A%2F%2Fhdl.handle.net%2F2027.42%2F49353&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-VaccariInteg12-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-VaccariInteg12_8-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Vaccari, L.; Craglia, M.; Fugazza, C. et al. (2012). \"Integrative Research: The EuroGEOSS Experience\". <i>IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing<\/i> <b>5<\/b> (6): 1603\u20131611. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1109%2FJSTARS.2012.2190382\" target=\"_blank\">10.1109\/JSTARS.2012.2190382<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Integrative+Research%3A+The+EuroGEOSS+Experience&rft.jtitle=IEEE+Journal+of+Selected+Topics+in+Applied+Earth+Observations+and+Remote+Sensing&rft.aulast=Vaccari%2C+L.%3B+Craglia%2C+M.%3B+Fugazza%2C+C.+et+al.&rft.au=Vaccari%2C+L.%3B+Craglia%2C+M.%3B+Fugazza%2C+C.+et+al.&rft.date=2012&rft.volume=5&rft.issue=6&rft.pages=1603%E2%80%931611&rft_id=info:doi\/10.1109%2FJSTARS.2012.2190382&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NativiEarth13-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NativiEarth13_9-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Nativi, S.; Craglia, M.; Pearlman, J. (2013). \"Earth Science Infrastructures Interoperability: The Brokering Approach\". <i>IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing<\/i> <b>6<\/b> (3): 1118\u20131129. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1109%2FJSTARS.2013.2243113\" target=\"_blank\">10.1109\/JSTARS.2013.2243113<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Earth+Science+Infrastructures+Interoperability%3A+The+Brokering+Approach&rft.jtitle=IEEE+Journal+of+Selected+Topics+in+Applied+Earth+Observations+and+Remote+Sensing&rft.aulast=Nativi%2C+S.%3B+Craglia%2C+M.%3B+Pearlman%2C+J.&rft.au=Nativi%2C+S.%3B+Craglia%2C+M.%3B+Pearlman%2C+J.&rft.date=2013&rft.volume=6&rft.issue=3&rft.pages=1118%E2%80%931129&rft_id=info:doi\/10.1109%2FJSTARS.2013.2243113&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KhalsaBrokering13-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KhalsaBrokering13_10-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Khalsa, S.J.; Pearlman, J.; Nativi, S. et al. (2013). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.earthcube.org\/sites\/default\/files\/doc-repository\/EarthCube%2520Brokering%2520Roadmap.pdf\" target=\"_blank\">\"Brokering for EarthCube Communities: A Road Map\"<\/a> (PDF). National Snow and Ice Data Center. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.7265%2FN59C6VBC\" target=\"_blank\">10.7265\/N59C6VBC<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.earthcube.org\/sites\/default\/files\/doc-repository\/EarthCube%2520Brokering%2520Roadmap.pdf\" target=\"_blank\">https:\/\/www.earthcube.org\/sites\/default\/files\/doc-repository\/EarthCube%2520Brokering%2520Roadmap.pdf<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Brokering+for+EarthCube+Communities%3A+A+Road+Map&rft.atitle=&rft.aulast=Khalsa%2C+S.J.%3B+Pearlman%2C+J.%3B+Nativi%2C+S.+et+al.&rft.au=Khalsa%2C+S.J.%3B+Pearlman%2C+J.%3B+Nativi%2C+S.+et+al.&rft.date=2013&rft.pub=National+Snow+and+Ice+Data+Center&rft_id=info:doi\/10.7265%2FN59C6VBC&rft_id=https%3A%2F%2Fwww.earthcube.org%2Fsites%2Fdefault%2Ffiles%2Fdoc-repository%2FEarthCube%252520Brokering%252520Roadmap.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SantoroMethod12-12\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SantoroMethod12_12-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation book\">Santoro, M.; Mazzetti, P.; Nativi, S. et al. (2012). \"Chapter 20: Methodologies for Augmented Discovery of Geospatial Resources\". In Information Resources Management Association. <i>Geographic Information Systems: Concepts, Methodologies, Tools, and Applications<\/i>. IGI Global. pp. 305\u2013335. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.4018%2F978-1-4666-2038-4.ch020\" target=\"_blank\">10.4018\/978-1-4666-2038-4.ch020<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/International_Standard_Book_Number\" target=\"_blank\">ISBN<\/a> 9781466620384.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Chapter+20%3A+Methodologies+for+Augmented+Discovery+of+Geospatial+Resources&rft.atitle=Geographic+Information+Systems%3A+Concepts%2C+Methodologies%2C+Tools%2C+and+Applications&rft.aulast=Santoro%2C+M.%3B+Mazzetti%2C+P.%3B+Nativi%2C+S.+et+al.&rft.au=Santoro%2C+M.%3B+Mazzetti%2C+P.%3B+Nativi%2C+S.+et+al.&rft.date=2012&rft.pages=pp.%26nbsp%3B305%E2%80%93335&rft.pub=IGI+Global&rft_id=info:doi\/10.4018%2F978-1-4666-2038-4.ch020&rft.isbn=9781466620384&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SunMiddle04-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SunMiddle04_13-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sun, X.-H.; Blatecky, A.R. (2004). \"Middleware: The key to next generation computing\". <i>Journal of Parallel and Distributed Computing<\/i> <b>64<\/b> (6): 689\u2013691. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1016%2Fj.jpdc.2004.03.002\" target=\"_blank\">10.1016\/j.jpdc.2004.03.002<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Middleware%3A+The+key+to+next+generation+computing&rft.jtitle=Journal+of+Parallel+and+Distributed+Computing&rft.aulast=Sun%2C+X.-H.%3B+Blatecky%2C+A.R.&rft.au=Sun%2C+X.-H.%3B+Blatecky%2C+A.R.&rft.date=2004&rft.volume=64&rft.issue=6&rft.pages=689%E2%80%93691&rft_id=info:doi\/10.1016%2Fj.jpdc.2004.03.002&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NativiSust15-14\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NativiSust15_14-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Nativi, S.; Craglia, M.; Pearlman, J. et al. (07 December 2015). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.rd-alliance.org\/group\/brokering-ig-brokering-governance-wg\/outcomes\/sustainable-business-models-brokering-middleware\" target=\"_blank\">\"Sustainable Business Models for Brokering Middleware to Support Research Interoperability\"<\/a>. Research Data Alliance<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.rd-alliance.org\/group\/brokering-ig-brokering-governance-wg\/outcomes\/sustainable-business-models-brokering-middleware\" target=\"_blank\">https:\/\/www.rd-alliance.org\/group\/brokering-ig-brokering-governance-wg\/outcomes\/sustainable-business-models-brokering-middleware<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 30 October 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Sustainable+Business+Models+for+Brokering+Middleware+to+Support+Research+Interoperability&rft.atitle=&rft.aulast=Nativi%2C+S.%3B+Craglia%2C+M.%3B+Pearlman%2C+J.+et+al.&rft.au=Nativi%2C+S.%3B+Craglia%2C+M.%3B+Pearlman%2C+J.+et+al.&rft.date=07+December+2015&rft.pub=Research+Data+Alliance&rft_id=https%3A%2F%2Fwww.rd-alliance.org%2Fgroup%2Fbrokering-ig-brokering-governance-wg%2Foutcomes%2Fsustainable-business-models-brokering-middleware&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RibesTheLong09-15\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-RibesTheLong09_15-0\" rel=\"external_link\">13.0<\/a><\/sup> <sup><a href=\"#cite_ref-RibesTheLong09_15-1\" rel=\"external_link\">13.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Ribes, D.; Finholt, T.A. (2009). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/aisel.aisnet.org\/jais\/vol10\/iss5\/5\" target=\"_blank\">\"The long now of technology infrastructure: Articulating tensions in development\"<\/a>. <i>Journal of the Association for Information Systems<\/i> <b>10<\/b> (5): 5<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/aisel.aisnet.org\/jais\/vol10\/iss5\/5\" target=\"_blank\">http:\/\/aisel.aisnet.org\/jais\/vol10\/iss5\/5<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+long+now+of+technology+infrastructure%3A+Articulating+tensions+in+development&rft.jtitle=Journal+of+the+Association+for+Information+Systems&rft.aulast=Ribes%2C+D.%3B+Finholt%2C+T.A.&rft.au=Ribes%2C+D.%3B+Finholt%2C+T.A.&rft.date=2009&rft.volume=10&rft.issue=5&rft.pages=5&rft_id=http%3A%2F%2Faisel.aisnet.org%2Fjais%2Fvol10%2Fiss5%2F5&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ECAdvisory16-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ECAdvisory16_16-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.earthcube.org\/sites\/default\/files\/doc-repository\/earthcube_rsv_report_final_16_03_21.pdf\" target=\"_blank\">\"EarthCube Advisory Committee Report\"<\/a> (PDF). EarthCube. 11 March 2016<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.earthcube.org\/sites\/default\/files\/doc-repository\/earthcube_rsv_report_final_16_03_21.pdf\" target=\"_blank\">https:\/\/www.earthcube.org\/sites\/default\/files\/doc-repository\/earthcube_rsv_report_final_16_03_21.pdf<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 30 October 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=EarthCube+Advisory+Committee+Report&rft.atitle=&rft.date=11+March+2016&rft.pub=EarthCube&rft_id=https%3A%2F%2Fwww.earthcube.org%2Fsites%2Fdefault%2Ffiles%2Fdoc-repository%2Fearthcube_rsv_report_final_16_03_21.pdf&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WitzeEffort16-17\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WitzeEffort16_17-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Witze, A. (2016). \"Effort to wrangle geoscience data faces uncertain future\". <i>Nature<\/i> <b>538<\/b> (7625): 303. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2F538303a\" target=\"_blank\">10.1038\/538303a<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/27762384\" target=\"_blank\">27762384<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Effort+to+wrangle+geoscience+data+faces+uncertain+future&rft.jtitle=Nature&rft.aulast=Witze%2C+A.&rft.au=Witze%2C+A.&rft.date=2016&rft.volume=538&rft.issue=7625&rft.pages=303&rft_id=info:doi\/10.1038%2F538303a&rft_id=info:pmid\/27762384&rfr_id=info:sid\/en.wikipedia.org:Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version \u2014 by design \u2014 lists them in order of appearance. Footnotes have been changed from numbers to letters as citations are currently using numbers.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20181214191057\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.399 seconds\nReal time usage: 0.436 seconds\nPreprocessor visited node count: 12534\/1000000\nPreprocessor generated node count: 34222\/1000000\nPost\u2010expand include size: 86909\/2097152 bytes\nTemplate argument size: 31699\/2097152 bytes\nHighest expansion depth: 15\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 405.010 1 - -total\n 77.26% 312.891 2 - Template:Reflist\n 63.99% 259.157 15 - Template:Citation\/core\n 41.81% 169.350 10 - Template:Cite_journal\n 16.27% 65.911 2 - Template:Cite_book\n 15.96% 64.631 1 - Template:Infobox_journal_article\n 15.31% 62.018 1 - Template:Infobox\n 11.98% 48.518 3 - Template:Cite_web\n 9.15% 37.050 80 - Template:Infobox\/row\n 5.69% 23.045 13 - Template:Citation\/identifier\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:9951-0!*!0!!en!5!* and timestamp 20181214191057 and revision id 32187\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project\">https:\/\/www.limswiki.org\/index.php\/Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","e80be5db806b508a1aecd418f32667db_images":["https:\/\/www.limswiki.org\/images\/1\/1d\/Fig1_Khalsa_DataScienceJ2017_16-1.png","https:\/\/www.limswiki.org\/images\/6\/64\/Fig2_Khalsa_DataScienceJ2017_16-1.png"],"e80be5db806b508a1aecd418f32667db_timestamp":1544814657,"0c7c45ef71cf479715ea32203f1e26d3_type":"article","0c7c45ef71cf479715ea32203f1e26d3_title":"A metadata-driven approach to data repository design (Harvey et al. 2017)","0c7c45ef71cf479715ea32203f1e26d3_url":"https:\/\/www.limswiki.org\/index.php\/Journal:A_metadata-driven_approach_to_data_repository_design","0c7c45ef71cf479715ea32203f1e26d3_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:A metadata-driven approach to data repository design\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nA metadata-driven approach to data repository designJournal\n \nJournal of CheminformaticsAuthor(s)\n \nHarvey, Matthew J.; McLean, Andrew; Rzepa, Henry S.Author affiliation(s)\n \nImperial College LondonPrimary contact\n \nEmail: rzepa at imperial dot ac dot ukYear published\n \n2017Volume and issue\n \n9Page(s)\n \n4DOI\n \n10.1186\/s13321-017-0190-6ISSN\n \n1758-2946Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttp:\/\/jcheminf.springeropen.com\/articles\/10.1186\/s13321-017-0190-6Download\n \nhttp:\/\/jcheminf.springeropen.com\/track\/pdf\/10.1186\/s13321-017-0190-6 (PDF)\n\nContents\n\n1 Abstract \n2 Background \n3 Data repository design features \n4 Engineering \n5 Metadata expression \n6 The user experience with examples of dataset collections, workflows and metadata \n7 Examples of data exposure \n\n7.1 ProView \n7.2 LiveView \n\n\n8 Declarations \n\n8.1 Authors' contributions \n\n8.1.1 Acknowledgements \n8.1.2 Competing interests \n\n\n\n\n9 Footnotes \n10 References \n11 Notes \n\n\n\nAbstract \nThe design and use of a metadata-driven data repository for research data management is described. Metadata is collected automatically during the submission process whenever possible and is registered with DataCite in accordance with their current metadata schema, in exchange for a persistent digital object identifier. Two examples of data preview are illustrated, including the demonstration of a method for integration with commercial software that confers rich domain-specific data analytics without introducing customization into the repository itself.\nKeywords: Data repository, metadata-driven, DataCite, data preview, Mpublish\n\nBackground \nTurnkey institutional repositories based on platforms such as DSpace[1] were introduced more than 10 years ago, with early applications directed largely towards archival of publication preprints and postprints. The recent increasing requirement for research data management emerging from funding agencies means that the focus is now shifting to the use of repositories as part of the data management processes. More recent data-centric tools such as Figshare[2] and Zenodo[3] reflect these changes. Such services rely on the minting of persistent identifiers or DOIs for the depositions using the DataCite agency.[4] Metadata describing the deposited material is supplied to DataCite and a DOI is returned. An early example of such research data management is illustrated by a DSpace-based project to produce and, 10 years later, curate a library of quantum-mechanically-optimised molecular coordinates derived from a computable subset of the National Cancer Institute's (NCI) collection of small molecules.[5] \nOne feature of the curation phase[6] of the project aimed to explore the capabilities of the DataCite metadata schemas to improve the discoverability of the deposited data. The metadata can then be exploited to create rich search queries.[7] As a result of the experiences gained from this project, we became aware that one limiting factor to the effective use of metadata was the repository design itself. The next stage therefore was to explore whether what we considered the essential requirements for a data repository could be incorporated into a new design. Here we report the principles used to create such a repository and some of the applications in chemistry that have resulted. These principles may in turn assist researchers wishing to deposit data in identifying the repository attributes that can best expose the discoverability and re-use of their data.\n\nData repository design features \nHere we describe the requirements we identified for a metadata-driven repository, an instance of which is deployed by the Imperial College HPC Service at https:\/\/data.hpc.imperial.ac.uk:\n\n In our design, we have focused on enhancing the FAIR[8] attributes of the data. The first attribute F means the data must be findable and practically this means making the metadata descriptors as rich and complete as possible to enable this. A = Accessibility is achieved by assigning persistent identifiers to the datasets and again associating them with appropriate metadata to enable automated retrieval processes if appropriate. This in turn helps ensure that the data can be accessed in a standard manner to enable its inter-operability in various software environments. R = Re-usability is related to understanding and trusting its provenance and the license terms under which it can be processed.\n The provenance of the deposited data is established from the unique ORCiD identifier of the depositor(s). On the first occasion that the repository is used after initial institutional-based authentication, a redirection to the ORCiD site occurs. There the depositor creates an account or authenticates an existing account, followed by authorising the repository request. The retrieved ORCiD is then added to the metadata manifest for the deposition as a depositor attribute. This initial depositor can then add further ORCiDs as co-authors to the entry; these again are validated automatically from the ORCiD site. This information is then collected and sent to DataCite for aggregation (Fig. 1e).\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. Metadata registered with DataCite for doi:10.14469\/hpc\/1280, with individual items (a)\u2013(f) discussed in the section on metadata expression\n\n\n\n The structure of the repository is based on hierarchical collections. Although collections have been a feature of early repositories such as DSpace, relatively little use has been made of them. We first identified the need for such structures from our early project[5] involving individual deposition of >168,000 items. This was deemed necessary since we considered that each item would benefit from having its own unique metadata descriptors, but within the context of a complete collection described using separate metadata. This is illustrated by assigning metadata both to individual entries[9] and to the collection which the individual items are members of.[10] Such hierarchical structures allow a research group to assign collections to project themes and within these to identify sub-collections associated with individual researchers or teams. The sub-collections can be further structured into types of data, other research objects such as software, presentations on the topic and other media such as video. The granularity of this approach is likely to depend very much on the discipline associated with the data. Thus in molecular sciences, the basic object naturally maps to the molecule, since this is the smallest object for which a dataset can be normally be generated and which can usefully be described by its own metadata. It would be less useful or convenient, for example, to disassemble the molecule into individual atoms as metadata carriers.\n Basing the repository design on collections also reflects the manner in which much modern science is conducted, often via multi-disciplinary collaborations in which each group can generate its own data collections. Collections also greatly facilitate data citation in journal articles. For example, the persistent identifier (DOI) of just the highest collection level of datasets associated with an article can be therein cited, avoiding citation blight. If a particular object (a molecule in our case) is being discussed in the text of the article, it might nevertheless be more appropriate to reference the specific DOI at that stage. Individual citation is also useful in, for example, tables of results or figures. The metadata for any individual cited dataset will also contain the attribute \"is member of,\" so that the hierarchy can be both tracked upwards, and via the attribute \"has members\" downwards (Fig. 1d). This hierarchy also introduces via such metadata further semantics into the citation process itself; each item is placed into appropriate context. Lack of such semantics\/context are arguably one of the most deficient aspects of current citation practices in journal articles.\n Our approach to metadata collection is to automate the process whenever possible. In the case of a molecule as an object, there are algorithms which can be used to generate appropriate metadata, the most useful and prominent of which is the InChI (International Chemical Identifier).[11] The task of creating such an identifier is effectively accomplished using the OpenBabel program library[12] or via Javascript-based resources.[13] These can accept as input a variety of chemical documents and generate an appropriate InChI identifier and InChI key uniquely describing them. The repository workflow automatically processes any uploaded data file through this algorithm and records all successful outputs. Such metadata is then associated with the Subject element in the DataCite schema (Fig. 1b).\n Other metadata describing any individual collection or items within the collection can be used to link to other data repositories via the appropriate persistent identifier (DOI) as well as associated journal publications where relevant, again using the DOI. These linkages can of course be made bidirectional by including a citation to the data at the remote site. Such inclusion of bidirectional linking data is currently less automated, but one might envisage future methods for automation involving the ORCID identifier and the ORCID resources as a possible aggregator.\n When a collection or an individual dataset is deposited, the item is immediately issued with a reserved DataCite DOI to allow the authors to quote it in any articles being prepared. Its status is defined as embargoed with an associated access code to allow collaborators to view the item and if necessary to also forward to a journal editor so that they can arrange access for referees. The embargo can be released at a time agreed by the authors, either in advance of the submission of any resulting article, or at the time of open publication of that article. The embargo release is not recursive to any members.\n The repository incorporates an ORE resource map[14], with appropriate metadata descriptors collected to describe the location of this resource map in the repository. This in turn allows a query of DataCite using just the assigned DOI to retrieve the ORE map (Fig. 1d) and facilitates automated retrieval of any individual file contained within a dataset based just on its DOI and if necessary its media type. We have described applications of this procedure termed DOI2Data.[15] Such procedures effectively remove any need to navigate from the landing page associated with the DOI to find and recover data and open up possibilities for large scale automated data mining procedures based just on, for example, top-level collection DOIs. We have also implemented the metadata required to allow the procedure DataCite calls content negotiation[15][16] (Fig. 1f). An example of date retrieval involving such negotiation might be http:\/\/data.datacite.org\/chemical\/x-mnpub\/10.14469\/hpc\/1280. This queries whether the item with assigned the DOI 10.14469\/hpc\/1280 has any content associated with the specified media type chemical\/x-mnpub and if so retrieves the first instance of such data. If there are multiple such instances in the dataset, then the ORE[14] (or METS)[15] method must be used to select them.\n An emerging feature of data repositories is data preview which can be used as a navigational metaphor. When repositories were largely focused on storing journal articles, preview of the most common document type, the PDF format, was the most important requirement. Most data however is not (certainly should not be) contained in such a document. Clearly, data preview is going to be largely dependent on the discipline associated with the data, and it will be difficult to generalize such procedures. We will describe two specific implementations of preview below, but it is important in the initial design of a repository to recognize the need for such rich preview.\n The repository is designed to be operable through a command line and programmatic web API. This allows scripted integration of the deposition process into other workflows such as electronic laboratory notebooks (ELNs).[17]\n The repository to be integrated with the widely-used source code management website GitHub, and can automatically allocate DOIs to software releases made through that platform. This extends the benefit of DOI citability to software projects without requiring additional effort on behalf of the developer, once the initial configuration has been made.\n The repository is registered via the registry of research data repositories.[18] This involves populating a schema template provided by re3data with the appropriate attributes, which is then processed to create a repository record. This results in the metadata describing the repository itself being assigned a DOI.[19] The repository schema is available as an XML file[20], with further data and metadata information deposited for inspection.[21]\nEngineering \nThe repository is intended for use by affiliates of the deploying hosting institution. Deposition first requires requires authentication performed against an institutional authentication and authorization (A&A) LDAP service. As a matter of policy, the repository also requires the depositing user to provide their ORCID identifier, obtained via an Oauth transaction[22] with the ORCID web service.\nThe repository is accessed via interfaces designed to function both as a human-friendly UI (accessed via a web browser) and as a programmable API. The latter is essential for integrating deposition into higher level tools and workflows and exposes all the capabilities of the repository. In order to deposit, command-line tools or other programs using the API must also authenticate and the repository is able to provide delegated access to a user\u2019s account for such tools through a transaction similar to Oauth. This allows automated use performed by a third party tool on behalf of a user to be clearly delineated from actions performed by the user and furthermore allows selective revocation of access to the third-party. Current integrations include a computational science portal which manages the execution of quantum chemistry calculations on Imperial College HPC resources. This portal is able to directly publish results into the repository, automatically passing on dataset data files and descriptive metadata.\nData files stored within a repository are maintained on a local filesystem on the server hosting the repository. As data burdens grow to multi-terabyte levels, we expect to migrate this data to remote filesystems. The internal database representation of a dataset deposition allows the files to reside on independent web server, in which case the repository will resolve any requests for them to an HTTP redirect. This would facilitate any future extension of the repository to use a third-party storage solution (e.g., Amazon Web Services S3 object store), or a content distribution network.\nThe repository automatically generates and publishes metadata records conforming to the DataCite Medadata Schema 4.0.[23] The metadata records are automatically updated whenever a user updates an entry, such as, for example, including a subsequently obtained DOI to a related journal article. At the present time there is a latency of approximately two days before the DataCite search engine index incorporates any updates.\nFor the GitHub integration, the repository end-user first associates the repository with GitHub, again using an OAuth transaction.[22] Thereafter, the repository maintains a list of the user\u2019s GitHub projects, both public and private, for which DOI creation may be selectively enabled. Once activated, a GitHub \u201cwebhook\u201d[24] is created which automatically makes an HTTP request to the repository whenever a software release is created. This request contains sufficient metadata about the release to allow the repository to create a DOI and automatically populate its metadata. The DOI is recorded within the repository and also added to the release description held within GitHub.\nThe repository is implemented in PHP hosted within an Apache web server and depends on a Postgres database. The source code is available on GitHub.[25]\n\nMetadata expression \nThe metadata present for a typical deposition conforms to the DataCite metadata schema. Metadata is represented visually in partial form (Fig. 1) and is also available in a semantically more complete form.[a] In addition, each file that is part of a deposited dataset (or is created as a result of the deposition processes) gets registered as a media type. Examples of these formats are shown in Fig. 1f. Specific metadata components are discussed briefly here.\n\n \"Resource type\" identifies whether the item is a dataset or a collection (Fig. 1a).\n \"Subjects\" is available for domain-specific information, in this example of unique InChI identifiers and strings[11] derived automatically by parsing the documents in the deposition. The strings subjectScheme and SchemeURI are used to reserve these elements for the subject domain and to disambiguate from similarly named subjects in other domains. Example:\n\r\n\n\n\n\n \"Related identifiers\" specifies the location of machine parsable metadata ORE files with use of the ORE resource map being used for the Live Preview described below. The identifiers for HasPart and IsPartOf entries is used to identify the collection hierarchies.\n \"Contributors\" includes researchers identified by their ORCID metadata, which in turn allows aggregation by the ORCID organisation.\n Other formats includes domain-specific media types present in the fileset. These entries allow rich searches to be performed, using syntax such as http:\/\/search.datacite.org\/ui?q=format:chemical\/x-* which retrieves all deposited instances of documents assigned the media type chemical\/x-cml in all repositories that register the metadata with DataCite.\nThe user experience with examples of dataset collections, workflows and metadata \nThe workflow (Fig. 2) is best illustrated using a recent example[26] associated with a published article.[27] Two basic types of data are associated with this publication; (a) raw and processed instrumental data relating to NMR spectra and (b) computational data deriving from, for example, quantum chemical simulations. Each is associated with a different user interface; the former uses the dataset deposition web page of the data repository itself[19] and the latter is injected into the repository using the command line interface as part of the workflow of a separate ELN via selection of the publish button associated with individual computational simulations.[17]\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 2. Deposition workflow, illustrating user activity and repository actions\n\n\n\n Although the ordering of the actions described below is not imposed, it evolved as an efficient procedure by experimentation and we suggest it as a reasonable starting point for less experienced users. The first requirement is to create an overall project collection using the add collection option in the repository itself. For this project it has the DOI:10.14469\/hpc\/1116. This automatically inherits the ORCiD identifier of the creator, and at this stage all the ORCiDs of the other collaborators can be added as co-authors using the collection edit option. Other metadata such as the title and description are also added at this stage; we chose to use the article abstract inherited from the associated journal article as the description in this instance. The final addition of metadata to the master collection relates to associated DOIs, the most important of which is the article associated with the data (DOI:10.1021\/acs.joc.6b02008). Also added are others deriving from earlier depositions to other repositories (e.g., DOI:10.14469\/ch\/191973) which pre-dated the installation of the repository described in this article. A summary of the master collection metadata accruing from these processes can be found at https:\/\/data.datacite.org\/10.14469\/hpc\/1116.\n One or more sub-collection(s) are then created to hold, for example, the instrumental NMR data (DOI:10.14469\/hpc\/1267) and the computational data (DOI:10.14469\/hpc\/1919) These sub-collection pages are edited to make them a member of the master collection; reciprocally each sub-collection is identified as a member of the master collection. These parent\u2013child relationships are formally defined in the metadata sent to DataCite. The co-authors of sub-collections are not necessarily all the authors of the master collection, but this decision is very much up to the research group to make; in principle each author could be identified by the various contributions they make to the overall project if desired.\n With the basic collection hierarchy now defined, individual datasets can be deposited as and when they emerge from an experiment. We suggest this action is incorporated into the daily laboratory procedures, rather than at the end of any project. For example, when an instrument's data becomes available, the deposit data button from the data repository is used. This requires a title and description as metadata, followed by selection of the data files and finally specifying which collection it is a member of (in this instance the NMR sub-collection 10.14469\/hpc\/1267). Some of the uploaded files can themselves serve to help create descriptive metadata about the data. In this instance for every set of molecular specific NMR data, in either raw spectrometer format (Bruker files as a ZIP archive) or in MestreNova (.mnova) format associated with the analysis software being used[28], a separate molecular connection table for that molecule in the form of either a Molfile (.mol) or a Chemdraw file (.cdx or .cdxml) is supplied. If the presence of such a file is detected by the repository workflow scripts, the file itself is passed to OpenBabel[12] in order to generate an InChI string and InChI key which will serve as molecular metadata (Fig. 1b). This exposure of metadata we regard as a better approach in principle to the often used alternative of including image representations of the molecular connectivity, which provides no exposed metadata. Other types of metadata generation could be added to our workflows using other types of content. An example of such a deposition has DOI:10.14469\/hpc\/1291 for which metadata can again be viewed by pre-pending the resolver https:\/\/data.datacite.org.\n The deposition of computational data occurs by a different mechanism, using the computational ELN we have previously described.[17] This system controls the computational workflow, ending with the option to publish to pre-selected data repositories, one of which is the one being discussed here. Each entry in this ELN is assigned its own project page. When published, this project becomes mapped to a collection of the same name in the data repository and is initially created in a private embargoed state, requiring an access code to view or edit. We use such inherited collections as holding areas in the data repository, since not all entries may turn out to be suitable for inclusion in the final publication-ready collection. The entries in this holding collection can subsequently be edited to become members of the master or sub-collections at the appropriate point prior to e.g. submission of a manuscript to a journal. An example of such a computational deposition is DOI:10.14469\/hpc\/1312. In this case it was re-assigned as a member of the master collection 10.14469\/hpc\/1116 rather than the holding collection inherited from the ELN.\n The final type of dataset was added as a member of 10.14469\/hpc\/1116 and is described in more detail below as LiveView below.\nExamples of data exposure \nProView \nData, especially if originating from instruments or algorithms encoded in software, can be highly complex. The data may be distributed across multiple data files (around 70 for the datasets described below). Some of these may even be binary-encoded with internal structures that can be poorly documented or hidden for proprietary reasons. Here we describe one example for the processing and re-use of such datasets by non-specialists for whom reliable or rich open-source software solutions may not be available and for whom permanent licensed access to the commercial software may not be practical or cost-effective. The datasets in this example originate from commercial NMR spectrometers and require specialist software to convert the data (in the so-called time domain) into visual representations of the data in the frequency domain (\u201cNMR Spectra\u201d). The raw instrumental outputs take the form of a number of separate data time-domain data files, many of which are without even the meta-information of filename extensions. Without the context of the appropriate software such datasets are essentially inaccessible.\nMestreNova[28] is commercial software allowing access to such NMR datasets and requires a license entitlement to activate its full feature-set beyond an initial trial period. However, an unlicensed version of MestreNova can have its full function enabled per dataset provided that dataset has been cryptographically signed. These signatures may only be produced by an agency in possession of a MestreNova Publisher license and accompanying signing keys. We have integrated such MestreNova publication into the deposition process of our repository, seamlessly conferring on any NMR dataset deposition the ability to be processed by the MestreNova software. When an NMR dataset in the form of a compressed zip archive or a MestreNova wrapping of such data is deposited into the repository, it is automatically signed, producing a MestreNova-specific \u201cmnpub\u201d-format file which is added to the deposition fileset. This plain-text file contains the URL of the copy of the originating MNova\/ZIP file within the repository, along with the cryptographic signature (Fig. 3). When the mnpub file is loaded into an unlicensed version of MestreNova, the associated resource is loaded from the embedded URL and, provided the cryptographic signature validates, the full features of the software are enabled.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 3. An example auto-generated mnpub file with components containing the URL of the signed resource, the signature, and the identity of the signing entity being the cryptographic key associated with the MestreNova publisher license granted to the repository \n\n\n\nWe believe this feature demonstrates a powerful incentive for using the repository. By enabling the use of custom software on submitted datasets, the repository becomes more than a passive silo for data, actively enabling depositors and viewers to interact with datasets in a rich, domain-specific way. Furthermore, it is accomplished without the need to develop format-specific enhancements into the repository itself.\n\nLiveView \nThe most generic solution to data preview is the deployment of HTML in conjunction with appropriated visualization routines. One example of this can be illustrated with the following components.\n\n The HTML document is assigned the reserved name index.html.\n When a document with this name is deposited, it is automatically transcluded into the landing page of the deposition using an HTML iframe: <iframe name=\u201cliveview\u201d src=\u201c\/resolve\/?doi=1248&file=13&access=\u201c width=\u201c100%\u201d height=\u201c600\u201d><\/iframe>, where the string doi=1248&file=13&access= references the appropriate database entry for the object assigned the DOI:10.14469\/hpc\/1248.\n The preview functionality is then enabled by author-specified inclusion of JavaScript containing utility functions hosted on the repository into the index.html document: <script src=\u201c https:\/\/data.hpc.imperial.ac.uk\/js\/utilities.js \u201c>. These serve to invoke an open-source molecular visualiser JSmol[29] which as the name implies is based purely on JavaScript.\n A further script is loaded at this stage, along with a formatting stylesheet: <script> insertFile(\u201cresolve-doi.js\u201d); insertFile(\u201ctable.css\u201d); <\/script>. The resolve-doi.js script invokes procedures which accept a dataset DOI as input. Querying the metadata associated with that DOI using the form http:\/\/data.datacite.org\/10.14469\/ch\/192018 allows the path to the ORE or METS resource manifests to be identified (https:\/\/spectradspace.lib.imperial.ac.uk:8443\/metadata\/handle\/10042\/196268\/ore.xml in this instance) and parsing of this manifest then allows the direct path to the data to be extracted and passed through to the JSmol visualization script.\n An author-initiated entry of the following type in the index.html document then conflates these various actions, the result being a live view of the retrieved dataset in the browser window (Fig. 4): javascript:handle_jmol(\u201810.14469\/ch\/192018\u2019,\u2019;display script;\u2019)\u201d>anchored text<\/a>\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 4. Liveview[30] of a dataset collection expressed using HTML and integrated visualization package with data retrieved by script-driven DOI-based data retrieval\n\n\n\n The document index.html is itself available for download from the repository and hence can act as a useful template for other authors.\nThe metadata-driven approach described here is directed towards enabling the FAIR[8] attributes of the data. Registering rich metadata associated with, for example, a primary research object; in molecular sciences the molecule, in turn allows rich searches (the F of FAIR) to be constructed using the generic resources available at the metadata aggregator, DataCite. This approach makes for an interesting contrast with that adopted by the publisher Elsevier[31] for their recently introduced DataSearch site. This initially appears to be based on the content provided by their journal base, ScienceDirect. The metadata here is simply the likely (but not guaranteed) presence of data within containers such as images or tables. In this approach, a user-driven data search culminates in the user being directed to the data source, which is in fact the article itself as an object. It is then very much up to the user to identify the data of interest within the article, whether in the text or images of the article itself or any associated supporting information. Such an approach is clearly not based on metadata describing data objects such as molecular entities and leaves the burden on the user to identify and extract any such information themselves. It would be itself fair to suggest that such a process does not fully adhere to the principles of FAIR data.\nA clear emerging trend is that journal publication is starting to be associated with procedures for identifying associated data as a primary research object in its own right. The extent to which such data is rendered fully open, in the sense of being compliant with all of the FAIR principles, remains uncertain. It seems likely that journal publishers, who will retain full control over the complete workflows involving data, may not necessarily wish to expose the data as openly FAIR or at the granularity which may be most useful to the researcher. Here we have outlined an alternative metadata-driven mechanism for achieving finely-grained FAIR data exposures in association with journal publication which can be utilized by authors themselves as the creators of research data, and where authors can retain control over the type of metadata captured. In this alternative model, open FAIR data is published at the research institutional level and the associated metadata aggregated at the global level by agencies such as DataCite without a need for intervention by journal-publisher workflows.\n\nDeclarations \nAuthors' contributions \nAll authors contributed to this work. All authors read and approved the final manuscript.\n\nAcknowledgements \nWe are grateful to MestreLab Research for providing instructions for signing their datasets and for countersigning the public key for the repository described here.\n\nCompeting interests \nThe authors declare that they have no competing interests.\n\nFootnotes \n\n\n\u2191 Available for download at data.datacite.org\/application\/x-datacite+xml\/10.14469\/hpc\/1280 \n\n\nReferences \n\n\n\u2191 \"DSpace\". DuraSpace Organization. http:\/\/www.dspace.org\/ . Retrieved 07 September 2016 .   \n\n\u2191 \"Figshare\". Figshare LLP. https:\/\/figshare.com\/ . Retrieved 07 September 2016 .   \n\n\u2191 \"Zenodo\". CERN Data Centre. https:\/\/zenodo.org\/ . Retrieved 07 September 2016 .   \n\n\u2191 \"DataCite\". DataCite Association. https:\/\/www.datacite.org\/ . Retrieved 07 September 2016 .   \n\n\u2191 5.0 5.1 Downing, J.; Murray-Rust, P.; Tonge, A.P. et al. (2008). \"SPECTRa: The deposition and validation of primary chemistry research data in digital repositories\". Journal of Chemical Information and Modeling 48 (8): 1571\u20131581. doi:10.1021\/ci7004737.   \n\n\u2191 Harvey, M.J.; Mason, N.J.; McLean, A. et al. (2015). \"Standards-based curation of a decade-old digital repository dataset of molecular information\". Journal of Cheminformatics 7: 43. doi:10.1186\/s13321-015-0093-3. PMC PMC4550659. PMID 26322133. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4550659 .   \n\n\u2191 Rzepa, H.S.; Mclean, A.; Harvey, M.J. (2015). \"InChI as a research data management tool\". Chemistry International 38 (3\u20134): 24\u201326. doi:10.1515\/ci-2016-3-408.   \n\n\u2191 8.0 8.1 Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). \"The FAIR Guiding Principles for scientific data management and stewardship\". Scientific Data 3: 160018. doi:10.1038\/sdata.2016.18. PMC PMC4792175. PMID 26978244. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4792175 .   \n\n\u2191 \"doi:10.14469\/ch\/153690\". DataCite Content Service Beta. DataCite Association. https:\/\/data.datacite.org\/10.14469\/ch\/153690 . Retrieved 07 September 2016 .   \n\n\u2191 \"doi:10.14469\/ch\/2\". DataCite Content Service Beta. DataCite Association. https:\/\/data.datacite.org\/10.14469\/ch\/2 . Retrieved 07 September 2016 .   \n\n\u2191 11.0 11.1 \"InChI Trust\". InChI Trust. http:\/\/www.inchi-trust.org\/ . Retrieved 07 September 2016 .   \n\n\u2191 12.0 12.1 O'Boyle, N.M.; Banck, M.; James, C.A. et al. (2011). \"Open Babel: An open chemical toolbox\". Journal of Cheminformatics 3: 33. doi:10.1186\/1758-2946-3-33. PMC PMC3198950. PMID 21982300. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3198950 .   \n\n\u2191 \"InChI for the Web Browser with InChI.js\". Metamolecular, LLC. https:\/\/metamolecular.com\/inchi-js\/ . Retrieved 07 September 2016 .   \n\n\u2191 14.0 14.1 \"ORE Specification - Abstract Data Model\". Open Archives Initiative. http:\/\/www.openarchives.org\/ore\/1.0\/datamodel . Retrieved 07 September 2016 .   \n\n\u2191 15.0 15.1 15.2 Harvey, M.J.; Mason, N.J.; McLean, A.; Rzepa, H.S. (2015). \"Standards-based metadata procedures for retrieving data for display or mining utilizing persistent (data-DOI) identifiers\". Journal of Cheminformatics 7: 37. doi:10.1186\/s13321-015-0081-7. PMC PMC4528360. PMID 26257829. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4528360 .   \n\n\u2191 \"DOI Content Negotiation\". DOI Citation Formatter. http:\/\/citation.crosscite.org\/docs.html . Retrieved 07 September 2016 .   \n\n\u2191 17.0 17.1 17.2 Harvey, M.J.; Mason, N.J.; Rzepa, H.S. (2014). \"Digital data repositories in chemistry and their integration with journals and electronic laboratory notebooks\". Journal of Chemical Information and Modeling 54 (10): 2627\u20132635. doi:10.1021\/ci500302p.   \n\n\u2191 \"About\". Registry of Research Data Repositories. Karlsruhe Institute of Technology. http:\/\/www.re3data.org\/about . Retrieved 07 September 2016 .   \n\n\u2191 19.0 19.1 \"Imperial College High Performance Computing Service Data Repository\". Registry of Research Data Repositories. Karlsruhe Institute of Technology. doi:10.17616\/R3K64N. http:\/\/www.re3data.org\/repository\/r3d100011965 . Retrieved 07 September 2016 .   \n\n\u2191 \"XML registration with re3data\". Imperial College High Performance Computing Service Data Repository. Imperial College London. 07 September 2016. doi:10.14469\/hpc\/1369. https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1369 . Retrieved 07 September 2016 .   \n\n\u2191 Rzepa, H.; McLean, A.; Harvey, M.J. (25 July 2016). \"Data Repository Project\". Imperial College High Performance Computing Service Data Repository. Imperial College London. doi:10.14469\/hpc\/1088. https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1088 . Retrieved 07 September 2016 .   \n\n\u2191 22.0 22.1 Richer, J.. \"User Authentication with OAuth 2.0\". OAuth.net. https:\/\/oauth.net\/articles\/authentication\/ . Retrieved 07 September 2016 .   \n\n\u2191 \"DataCite Metadata Schema 4.0\". DataCite Metadata Working Group. 19 September 2016. http:\/\/schema.datacite.org\/meta\/kernel-4.0\/ . Retrieved 16 January 2017 .   \n\n\u2191 \"Webhooks\". API. GitHub, Inc. https:\/\/developer.github.com\/webhooks\/ . Retrieved 07 September 2016 .   \n\n\u2191 Harvey, M.J.. \"ICHPC\/hpc-repo\". GitHub, Inc. doi:10.14469\/hpc\/1487. https:\/\/github.com\/ICHPC\/hpc-repo . Retrieved 07 September 2016 .   \n\n\u2191 Rzepa, H.; White, A.; Braddock, D.C. et al. (26 July 2016). \"Epimeric Face-Selective Oxidations and Diastereodivergent Transannular Oxonium Ion Formation-Fragmentations: Computational Modelling and Total Syntheses of 12-Epoxyobtusallene IV, 12-Epoxyobtusallene II, Obtusallene X, Marilzabicycloallene C and Marilzabicycloallene D\". Imperial College High Performance Computing Service Data Repository. Imperial College London. doi:10.14469\/hpc\/1116. https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1116 . Retrieved 07 September 2016 .   \n\n\u2191 Clarke, J.; Bonney, K.J.; Yaqoob, M. et al. (2016). \"Epimeric Face-Selective Oxidations and Diastereodivergent Transannular Oxonium Ion Formation Fragmentations: Computational Modeling and Total Syntheses of 12-Epoxyobtusallene IV, 12-Epoxyobtusallene II, Obtusallene X, Marilzabicycloallene C, and Marilzabicycloallene D\". Journal of Organic Chemistry 81 (20): 9539\u20139552. doi:10.1021\/acs.joc.6b02008.   \n\n\u2191 28.0 28.1 \"Mnova\". Mestrelab Research, S.L. http:\/\/mestrelab.com\/software\/mnova\/ . Retrieved 07 September 2016 .   \n\n\u2191 Hanson, R.M.; Prilusky, J.; Renjian, Z. et al. (2013). \"JSmol and the Next-Generation Web-Based Representation of 3D Molecular Structure as Applied to Proteopedia\". Israel Journal of Chemistry 53 (3-4): 207\u2013216. doi:10.1002\/ijch.201300024.   \n\n\u2191 Rzepa, H.; White, A.; Braddock, D.C. et al. (10 August 2016). \"FAIR Data table. Computed relative reaction free energies (kcal\/mol-1) of Obtusallene derived oxonium and chloronium cations\". Imperial College High Performance Computing Service Data Repository. Imperial College London. doi:10.14469\/hpc\/1248. https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1248 . Retrieved 07 November 2016 .   \n\n\u2191 \"DataSearch\". Elsevier B.V. https:\/\/datasearch.elsevier.com\/ . Retrieved 07 September 2017 .   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. In one case, the original citation was incomplete (#6) and was corrected here. What was originally reference 26, a link to a downloadable file, was turned into a footnote for clarity.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:A_metadata-driven_approach_to_data_repository_design\">https:\/\/www.limswiki.org\/index.php\/Journal:A_metadata-driven_approach_to_data_repository_design<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2017)LIMSwiki journal articles (all)LIMSwiki journal articles on big dataLIMSwiki journal articles on informatics\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 8 February 2017, at 20:36.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 1,495 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","0c7c45ef71cf479715ea32203f1e26d3_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_A_metadata-driven_approach_to_data_repository_design skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:A metadata-driven approach to data repository design<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Abstract\">Abstract<\/span><\/h2>\n<p>The design and use of a metadata-driven data repository for research data management is described. Metadata is collected automatically during the submission process whenever possible and is registered with DataCite in accordance with their current metadata schema, in exchange for a persistent digital object identifier. Two examples of data preview are illustrated, including the demonstration of a method for integration with commercial software that confers rich domain-specific <a href=\"https:\/\/www.limswiki.org\/index.php\/Data_analysis\" title=\"Data analysis\" target=\"_blank\" class=\"wiki-link\" data-key=\"545c95e40ca67c9e63cd0a16042a5bd1\">data analytics<\/a> without introducing customization into the repository itself.\n<\/p><p><b>Keywords<\/b>: Data repository, metadata-driven, DataCite, data preview, Mpublish\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Background\">Background<\/span><\/h2>\n<p>Turnkey institutional repositories based on platforms such as <a href=\"https:\/\/www.limswiki.org\/index.php\/DSpace\" title=\"DSpace\" target=\"_blank\" class=\"wiki-link\" data-key=\"a722996d9e71fde809676b66f183df91\">DSpace<\/a><sup id=\"rdp-ebb-cite_ref-DSpace_1-0\" class=\"reference\"><a href=\"#cite_note-DSpace-1\" rel=\"external_link\">[1]<\/a><\/sup> were introduced more than 10 years ago, with early applications directed largely towards archival of publication preprints and postprints. The recent increasing requirement for research data management emerging from funding agencies means that the focus is now shifting to the use of repositories as part of the data management processes. More recent data-centric tools such as Figshare<sup id=\"rdp-ebb-cite_ref-Figshare_2-0\" class=\"reference\"><a href=\"#cite_note-Figshare-2\" rel=\"external_link\">[2]<\/a><\/sup> and Zenodo<sup id=\"rdp-ebb-cite_ref-Zenodo_3-0\" class=\"reference\"><a href=\"#cite_note-Zenodo-3\" rel=\"external_link\">[3]<\/a><\/sup> reflect these changes. Such services rely on the minting of persistent identifiers or DOIs for the depositions using the DataCite agency.<sup id=\"rdp-ebb-cite_ref-DataCite_4-0\" class=\"reference\"><a href=\"#cite_note-DataCite-4\" rel=\"external_link\">[4]<\/a><\/sup> Metadata describing the deposited material is supplied to DataCite and a DOI is returned. An early example of such research data management is illustrated by a DSpace-based project to produce and, 10 years later, curate a library of quantum-mechanically-optimised molecular coordinates derived from a computable subset of the <a href=\"https:\/\/www.limswiki.org\/index.php\/National_Cancer_Institute\" title=\"National Cancer Institute\" target=\"_blank\" class=\"wiki-link\" data-key=\"281916a9fbd28f1e14ba3a07ff64abc8\">National Cancer Institute<\/a>'s (NCI) collection of small molecules.<sup id=\"rdp-ebb-cite_ref-DowningSPECTRa08_5-0\" class=\"reference\"><a href=\"#cite_note-DowningSPECTRa08-5\" rel=\"external_link\">[5]<\/a><\/sup> \n<\/p><p>One feature of the curation phase<sup id=\"rdp-ebb-cite_ref-HarveyStandards15_6-0\" class=\"reference\"><a href=\"#cite_note-HarveyStandards15-6\" rel=\"external_link\">[6]<\/a><\/sup> of the project aimed to explore the capabilities of the DataCite metadata schemas to improve the discoverability of the deposited data. The metadata can then be exploited to create rich search queries.<sup id=\"rdp-ebb-cite_ref-RzepaInChl16_7-0\" class=\"reference\"><a href=\"#cite_note-RzepaInChl16-7\" rel=\"external_link\">[7]<\/a><\/sup> As a result of the experiences gained from this project, we became aware that one limiting factor to the effective use of metadata was the repository design itself. The next stage therefore was to explore whether what we considered the essential requirements for a data repository could be incorporated into a new design. Here we report the principles used to create such a repository and some of the applications in chemistry that have resulted. These principles may in turn assist researchers wishing to deposit data in identifying the repository attributes that can best expose the discoverability and re-use of their data.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Data_repository_design_features\">Data repository design features<\/span><\/h2>\n<p>Here we describe the requirements we identified for a metadata-driven repository, an instance of which is deployed by the Imperial College HPC Service at <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/data.hpc.imperial.ac.uk\" target=\"_blank\">https:\/\/data.hpc.imperial.ac.uk<\/a>:\n<\/p>\n<ul><li> In our design, we have focused on enhancing the FAIR<sup id=\"rdp-ebb-cite_ref-WilkinsonTheFAIR16_8-0\" class=\"reference\"><a href=\"#cite_note-WilkinsonTheFAIR16-8\" rel=\"external_link\">[8]<\/a><\/sup> attributes of the data. The first attribute F means the data must be findable and practically this means making the metadata descriptors as rich and complete as possible to enable this. A = Accessibility is achieved by assigning persistent identifiers to the datasets and again associating them with appropriate metadata to enable automated retrieval processes if appropriate. This in turn helps ensure that the data can be accessed in a standard manner to enable its inter-operability in various software environments. R = Re-usability is related to understanding and trusting its provenance and the license terms under which it can be processed.<\/li><\/ul>\n<ul><li> The provenance of the deposited data is established from the unique ORCiD identifier of the depositor(s). On the first occasion that the repository is used after initial institutional-based authentication, a redirection to the ORCiD site occurs. There the depositor creates an account or authenticates an existing account, followed by authorising the repository request. The retrieved ORCiD is then added to the metadata manifest for the deposition as a depositor attribute. This initial depositor can then add further ORCiDs as co-authors to the entry; these again are validated automatically from the ORCiD site. This information is then collected and sent to DataCite for aggregation (Fig. 1e).<\/li><\/ul>\n<p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Harvey_JoCheminformatics2017_9.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"2606ca8095aab1ec4fb00c5c4f81121a\"><img alt=\"Fig1 Harvey JoCheminformatics2017 9.gif\" src=\"https:\/\/www.limswiki.org\/images\/5\/5f\/Fig1_Harvey_JoCheminformatics2017_9.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> Metadata registered with DataCite for doi:10.14469\/hpc\/1280, with individual items (a)\u2013(f) discussed in the section on metadata expression<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<ul><li> The structure of the repository is based on hierarchical collections. Although collections have been a feature of early repositories such as DSpace, relatively little use has been made of them. We first identified the need for such structures from our early project<sup id=\"rdp-ebb-cite_ref-DowningSPECTRa08_5-1\" class=\"reference\"><a href=\"#cite_note-DowningSPECTRa08-5\" rel=\"external_link\">[5]<\/a><\/sup> involving individual deposition of >168,000 items. This was deemed necessary since we considered that each item would benefit from having its own unique metadata descriptors, but within the context of a complete collection described using separate metadata. This is illustrated by assigning metadata both to individual entries<sup id=\"rdp-ebb-cite_ref-DC10.14469ch153690_9-0\" class=\"reference\"><a href=\"#cite_note-DC10.14469ch153690-9\" rel=\"external_link\">[9]<\/a><\/sup> and to the collection which the individual items are members of.<sup id=\"rdp-ebb-cite_ref-DC10.14469ch2_10-0\" class=\"reference\"><a href=\"#cite_note-DC10.14469ch2-10\" rel=\"external_link\">[10]<\/a><\/sup> Such hierarchical structures allow a research group to assign collections to project themes and within these to identify sub-collections associated with individual researchers or teams. The sub-collections can be further structured into types of data, other research objects such as software, presentations on the topic and other media such as video. The granularity of this approach is likely to depend very much on the discipline associated with the data. Thus in molecular sciences, the basic object naturally maps to the molecule, since this is the smallest object for which a dataset can be normally be generated and which can usefully be described by its own metadata. It would be less useful or convenient, for example, to disassemble the molecule into individual atoms as metadata carriers.<\/li><\/ul>\n<dl><dd> Basing the repository design on collections also reflects the manner in which much modern science is conducted, often via multi-disciplinary collaborations in which each group can generate its own data collections. Collections also greatly facilitate data citation in journal articles. For example, the persistent identifier (DOI) of just the highest collection level of datasets associated with an article can be therein cited, avoiding citation blight. If a particular object (a molecule in our case) is being discussed in the text of the article, it might nevertheless be more appropriate to reference the specific DOI at that stage. Individual citation is also useful in, for example, tables of results or figures. The metadata for any individual cited dataset will also contain the attribute \"is member of,\" so that the hierarchy can be both tracked upwards, and via the attribute \"has members\" downwards (Fig. 1d). This hierarchy also introduces via such metadata further semantics into the citation process itself; each item is placed into appropriate context. Lack of such semantics\/context are arguably one of the most deficient aspects of current citation practices in journal articles.<\/dd><\/dl>\n<ul><li> Our approach to metadata collection is to automate the process whenever possible. In the case of a molecule as an object, there are algorithms which can be used to generate appropriate metadata, the most useful and prominent of which is the InChI (International Chemical Identifier).<sup id=\"rdp-ebb-cite_ref-InChI_11-0\" class=\"reference\"><a href=\"#cite_note-InChI-11\" rel=\"external_link\">[11]<\/a><\/sup> The task of creating such an identifier is effectively accomplished using the OpenBabel program library<sup id=\"rdp-ebb-cite_ref-OBoyleOpen11_12-0\" class=\"reference\"><a href=\"#cite_note-OBoyleOpen11-12\" rel=\"external_link\">[12]<\/a><\/sup> or via Javascript-based resources.<sup id=\"rdp-ebb-cite_ref-InChI.js_13-0\" class=\"reference\"><a href=\"#cite_note-InChI.js-13\" rel=\"external_link\">[13]<\/a><\/sup> These can accept as input a variety of chemical documents and generate an appropriate InChI identifier and InChI key uniquely describing them. The repository workflow automatically processes any uploaded data file through this algorithm and records all successful outputs. Such metadata is then associated with the <tt>Subject<\/tt> element in the DataCite schema (Fig. 1b).<\/li><\/ul>\n<ul><li> Other metadata describing any individual collection or items within the collection can be used to link to other data repositories via the appropriate persistent identifier (DOI) as well as associated journal publications where relevant, again using the DOI. These linkages can of course be made bidirectional by including a citation to the data at the remote site. Such inclusion of bidirectional linking data is currently less automated, but one might envisage future methods for automation involving the ORCID identifier and the ORCID resources as a possible aggregator.<\/li><\/ul>\n<ul><li> When a collection or an individual dataset is deposited, the item is immediately issued with a reserved DataCite DOI to allow the authors to quote it in any articles being prepared. Its status is defined as embargoed with an associated access code to allow collaborators to view the item and if necessary to also forward to a journal editor so that they can arrange access for referees. The embargo can be released at a time agreed by the authors, either in advance of the submission of any resulting article, or at the time of open publication of that article. The embargo release is not recursive to any members.<\/li><\/ul>\n<ul><li> The repository incorporates an ORE resource map<sup id=\"rdp-ebb-cite_ref-ORE_14-0\" class=\"reference\"><a href=\"#cite_note-ORE-14\" rel=\"external_link\">[14]<\/a><\/sup>, with appropriate metadata descriptors collected to describe the location of this resource map in the repository. This in turn allows a query of DataCite using just the assigned DOI to retrieve the ORE map (Fig. 1d) and facilitates automated retrieval of any individual file contained within a dataset based just on its DOI and if necessary its media type. We have described applications of this procedure termed DOI2Data.<sup id=\"rdp-ebb-cite_ref-HarveyStand15_15-0\" class=\"reference\"><a href=\"#cite_note-HarveyStand15-15\" rel=\"external_link\">[15]<\/a><\/sup> Such procedures effectively remove any need to navigate from the landing page associated with the DOI to find and recover data and open up possibilities for large scale automated data mining procedures based just on, for example, top-level collection DOIs. We have also implemented the metadata required to allow the procedure DataCite calls content negotiation<sup id=\"rdp-ebb-cite_ref-HarveyStand15_15-1\" class=\"reference\"><a href=\"#cite_note-HarveyStand15-15\" rel=\"external_link\">[15]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-ConNeg_16-0\" class=\"reference\"><a href=\"#cite_note-ConNeg-16\" rel=\"external_link\">[16]<\/a><\/sup> (Fig. 1f). An example of date retrieval involving such negotiation might be <tt><a rel=\"external_link\" class=\"external free\" href=\"http:\/\/data.datacite.org\/chemical\/x-mnpub\/10.14469\/hpc\/1280\" target=\"_blank\">http:\/\/data.datacite.org\/chemical\/x-mnpub\/10.14469\/hpc\/1280<\/a><\/tt>. This queries whether the item with assigned the DOI <tt>10.14469\/hpc\/1280<\/tt> has any content associated with the specified media type <tt>chemical\/x-mnpub<\/tt> and if so retrieves the first instance of such data. If there are multiple such instances in the dataset, then the ORE<sup id=\"rdp-ebb-cite_ref-ORE_14-1\" class=\"reference\"><a href=\"#cite_note-ORE-14\" rel=\"external_link\">[14]<\/a><\/sup> (or METS)<sup id=\"rdp-ebb-cite_ref-HarveyStand15_15-2\" class=\"reference\"><a href=\"#cite_note-HarveyStand15-15\" rel=\"external_link\">[15]<\/a><\/sup> method must be used to select them.<\/li><\/ul>\n<ul><li> An emerging feature of data repositories is data preview which can be used as a navigational metaphor. When repositories were largely focused on storing journal articles, preview of the most common document type, the PDF format, was the most important requirement. Most data however is not (certainly should not be) contained in such a document. Clearly, data preview is going to be largely dependent on the discipline associated with the data, and it will be difficult to generalize such procedures. We will describe two specific implementations of preview below, but it is important in the initial design of a repository to recognize the need for such rich preview.<\/li><\/ul>\n<ul><li> The repository is designed to be operable through a command line and programmatic web API. This allows scripted integration of the deposition process into other workflows such as <a href=\"https:\/\/www.limswiki.org\/index.php\/Electronic_laboratory_notebook\" title=\"Electronic laboratory notebook\" target=\"_blank\" class=\"wiki-link\" data-key=\"a9fbbd5e0807980106763fab31f1e72f\">electronic laboratory notebooks<\/a> (ELNs).<sup id=\"rdp-ebb-cite_ref-HarveyDigital14_17-0\" class=\"reference\"><a href=\"#cite_note-HarveyDigital14-17\" rel=\"external_link\">[17]<\/a><\/sup><\/li><\/ul>\n<ul><li> The repository to be integrated with the widely-used source code management website GitHub, and can automatically allocate DOIs to software releases made through that platform. This extends the benefit of DOI citability to software projects without requiring additional effort on behalf of the developer, once the initial configuration has been made.<\/li><\/ul>\n<ul><li> The repository is registered via the registry of research data repositories.<sup id=\"rdp-ebb-cite_ref-RE3_18-0\" class=\"reference\"><a href=\"#cite_note-RE3-18\" rel=\"external_link\">[18]<\/a><\/sup> This involves populating a schema template provided by re3data with the appropriate attributes, which is then processed to create a repository record. This results in the metadata describing the repository itself being assigned a DOI.<sup id=\"rdp-ebb-cite_ref-ImperialCollege_19-0\" class=\"reference\"><a href=\"#cite_note-ImperialCollege-19\" rel=\"external_link\">[19]<\/a><\/sup> The repository schema is available as an XML file<sup id=\"rdp-ebb-cite_ref-XMLReg_20-0\" class=\"reference\"><a href=\"#cite_note-XMLReg-20\" rel=\"external_link\">[20]<\/a><\/sup>, with further data and metadata information deposited for inspection.<sup id=\"rdp-ebb-cite_ref-DRP_21-0\" class=\"reference\"><a href=\"#cite_note-DRP-21\" rel=\"external_link\">[21]<\/a><\/sup><\/li><\/ul>\n<h2><span class=\"mw-headline\" id=\"Engineering\">Engineering<\/span><\/h2>\n<p>The repository is intended for use by affiliates of the deploying hosting institution. Deposition first requires requires authentication performed against an institutional authentication and authorization (A&A) LDAP service. As a matter of policy, the repository also requires the depositing user to provide their ORCID identifier, obtained via an Oauth transaction<sup id=\"rdp-ebb-cite_ref-RicherUser_22-0\" class=\"reference\"><a href=\"#cite_note-RicherUser-22\" rel=\"external_link\">[22]<\/a><\/sup> with the ORCID web service.\n<\/p><p>The repository is accessed via interfaces designed to function both as a human-friendly UI (accessed via a web browser) and as a programmable API. The latter is essential for integrating deposition into higher level tools and workflows and exposes all the capabilities of the repository. In order to deposit, command-line tools or other programs using the API must also authenticate and the repository is able to provide delegated access to a user\u2019s account for such tools through a transaction similar to Oauth. This allows automated use performed by a third party tool on behalf of a user to be clearly delineated from actions performed by the user and furthermore allows selective revocation of access to the third-party. Current integrations include a computational science portal which manages the execution of quantum chemistry calculations on Imperial College HPC resources. This portal is able to directly publish results into the repository, automatically passing on dataset data files and descriptive metadata.\n<\/p><p>Data files stored within a repository are maintained on a local filesystem on the server hosting the repository. As data burdens grow to multi-terabyte levels, we expect to migrate this data to remote filesystems. The internal database representation of a dataset deposition allows the files to reside on independent web server, in which case the repository will resolve any requests for them to an HTTP redirect. This would facilitate any future extension of the repository to use a third-party storage solution (e.g., Amazon Web Services S3 object store), or a content distribution network.\n<\/p><p>The repository automatically generates and publishes metadata records conforming to the DataCite Medadata Schema 4.0.<sup id=\"rdp-ebb-cite_ref-DMWG_23-0\" class=\"reference\"><a href=\"#cite_note-DMWG-23\" rel=\"external_link\">[23]<\/a><\/sup> The metadata records are automatically updated whenever a user updates an entry, such as, for example, including a subsequently obtained DOI to a related journal article. At the present time there is a latency of approximately two days before the DataCite search engine index incorporates any updates.\n<\/p><p>For the GitHub integration, the repository end-user first associates the repository with GitHub, again using an OAuth transaction.<sup id=\"rdp-ebb-cite_ref-RicherUser_22-1\" class=\"reference\"><a href=\"#cite_note-RicherUser-22\" rel=\"external_link\">[22]<\/a><\/sup> Thereafter, the repository maintains a list of the user\u2019s GitHub projects, both public and private, for which DOI creation may be selectively enabled. Once activated, a GitHub \u201cwebhook\u201d<sup id=\"rdp-ebb-cite_ref-GHWebhooks_24-0\" class=\"reference\"><a href=\"#cite_note-GHWebhooks-24\" rel=\"external_link\">[24]<\/a><\/sup> is created which automatically makes an HTTP request to the repository whenever a software release is created. This request contains sufficient metadata about the release to allow the repository to create a DOI and automatically populate its metadata. The DOI is recorded within the repository and also added to the release description held within GitHub.\n<\/p><p>The repository is implemented in PHP hosted within an Apache web server and depends on a <a href=\"https:\/\/www.limswiki.org\/index.php\/PostgreSQL\" title=\"PostgreSQL\" target=\"_blank\" class=\"wiki-link\" data-key=\"a5dd945cdcb63e2d8f7a5edb3a896d82\">Postgres<\/a> database. The source code is available on GitHub.<sup id=\"rdp-ebb-cite_ref-HPCRepo_25-0\" class=\"reference\"><a href=\"#cite_note-HPCRepo-25\" rel=\"external_link\">[25]<\/a><\/sup>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Metadata_expression\">Metadata expression<\/span><\/h2>\n<p>The metadata present for a typical deposition conforms to the DataCite metadata schema. Metadata is represented visually in partial form (Fig. 1) and is also available in a semantically more complete form.<sup id=\"rdp-ebb-cite_ref-26\" class=\"reference\"><a href=\"#cite_note-26\" rel=\"external_link\">[a]<\/a><\/sup> In addition, each file that is part of a deposited dataset (or is created as a result of the deposition processes) gets registered as a media type. Examples of these formats are shown in Fig. 1f. Specific metadata components are discussed briefly here.\n<\/p>\n<ul><li> \"Resource type\" identifies whether the item is a dataset or a collection (Fig. 1a).<\/li>\n<li> \"Subjects\" is available for domain-specific information, in this example of unique InChI identifiers and strings<sup id=\"rdp-ebb-cite_ref-InChI_11-1\" class=\"reference\"><a href=\"#cite_note-InChI-11\" rel=\"external_link\">[11]<\/a><\/sup> derived automatically by parsing the documents in the deposition. The strings <tt>subjectScheme<\/tt> and <tt>SchemeURI<\/tt> are used to reserve these elements for the subject domain and to disambiguate from similarly named subjects in other domains. Example:<\/li><\/ul>\n<p><br \/>\n<\/p>\n<dl><dd><a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1.5_Harvey_JoCheminformatics2017_9.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"9ecba2cf84f94f1f83a21d73237380e5\"><img alt=\"Fig1.5 Harvey JoCheminformatics2017 9.gif\" src=\"https:\/\/www.limswiki.org\/images\/5\/54\/Fig1.5_Harvey_JoCheminformatics2017_9.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a><\/dd><\/dl>\n<div style=\"clear:both;\"><\/div>\n<ul><li> \"Related identifiers\" specifies the location of machine parsable metadata ORE files with use of the ORE resource map being used for the Live Preview described below. The identifiers for <tt>HasPart<\/tt> and <tt>IsPartOf<\/tt> entries is used to identify the collection hierarchies.<\/li>\n<li> \"Contributors\" includes researchers identified by their ORCID metadata, which in turn allows aggregation by the ORCID organisation.<\/li>\n<li> Other formats includes domain-specific media types present in the fileset. These entries allow rich searches to be performed, using syntax such as <tt><a rel=\"external_link\" class=\"external free\" href=\"http:\/\/search.datacite.org\/ui?q=format:chemical\/x-*\" target=\"_blank\">http:\/\/search.datacite.org\/ui?q=format:chemical\/x-*<\/a><\/tt> which retrieves all deposited instances of documents assigned the media type <tt>chemical\/x-cml<\/tt> in all repositories that register the metadata with DataCite.<\/li><\/ul>\n<h2><span class=\"mw-headline\" id=\"The_user_experience_with_examples_of_dataset_collections.2C_workflows_and_metadata\">The user experience with examples of dataset collections, workflows and metadata<\/span><\/h2>\n<p>The workflow (Fig. 2) is best illustrated using a recent example<sup id=\"rdp-ebb-cite_ref-RzepaEpimeric16_27-0\" class=\"reference\"><a href=\"#cite_note-RzepaEpimeric16-27\" rel=\"external_link\">[26]<\/a><\/sup> associated with a published article.<sup id=\"rdp-ebb-cite_ref-ClarkeEpimeric16_28-0\" class=\"reference\"><a href=\"#cite_note-ClarkeEpimeric16-28\" rel=\"external_link\">[27]<\/a><\/sup> Two basic types of data are associated with this publication; (a) raw and processed instrumental data relating to NMR spectra and (b) computational data deriving from, for example, quantum chemical simulations. Each is associated with a different user interface; the former uses the dataset deposition web page of the data repository itself<sup id=\"rdp-ebb-cite_ref-ImperialCollege_19-1\" class=\"reference\"><a href=\"#cite_note-ImperialCollege-19\" rel=\"external_link\">[19]<\/a><\/sup> and the latter is injected into the repository using the command line interface as part of the workflow of a separate ELN via selection of the publish button associated with individual computational simulations.<sup id=\"rdp-ebb-cite_ref-HarveyDigital14_17-1\" class=\"reference\"><a href=\"#cite_note-HarveyDigital14-17\" rel=\"external_link\">[17]<\/a><\/sup>\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig2_Harvey_JoCheminformatics2017_9.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"a9dcfa9237dad0c3020623e6e69837ab\"><img alt=\"Fig2 Harvey JoCheminformatics2017 9.gif\" src=\"https:\/\/www.limswiki.org\/images\/b\/b0\/Fig2_Harvey_JoCheminformatics2017_9.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 2.<\/b> Deposition workflow, illustrating user activity and repository actions<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<ul><li> Although the ordering of the actions described below is not imposed, it evolved as an efficient procedure by experimentation and we suggest it as a reasonable starting point for less experienced users. The first requirement is to create an overall project collection using the add collection option in the repository itself. For this project it has the DOI:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469\/hpc\/1116\" target=\"_blank\">10.14469\/hpc\/1116<\/a>. This automatically inherits the ORCiD identifier of the creator, and at this stage all the ORCiDs of the other collaborators can be added as co-authors using the collection edit option. Other metadata such as the title and description are also added at this stage; we chose to use the article abstract inherited from the associated journal article as the description in this instance. The final addition of metadata to the master collection relates to associated DOIs, the most important of which is the article associated with the data (DOI:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1021\/acs.joc.6b02008\" target=\"_blank\">10.1021\/acs.joc.6b02008<\/a>). Also added are others deriving from earlier depositions to other repositories (e.g., DOI:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469\/ch\/191973\" target=\"_blank\">10.14469\/ch\/191973<\/a>) which pre-dated the installation of the repository described in this article. A summary of the master collection metadata accruing from these processes can be found at <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/data.datacite.org\/10.14469\/hpc\/1116\" target=\"_blank\">https:\/\/data.datacite.org\/10.14469\/hpc\/1116<\/a>.<\/li><\/ul>\n<ul><li> One or more sub-collection(s) are then created to hold, for example, the instrumental NMR data (DOI:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469\/hpc\/1267\" target=\"_blank\">10.14469\/hpc\/1267<\/a>) and the computational data (DOI:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469\/hpc\/1919\" target=\"_blank\">10.14469\/hpc\/1919<\/a>) These sub-collection pages are edited to make them a member of the master collection; reciprocally each sub-collection is identified as a member of the master collection. These parent\u2013child relationships are formally defined in the metadata sent to DataCite. The co-authors of sub-collections are not necessarily all the authors of the master collection, but this decision is very much up to the research group to make; in principle each author could be identified by the various contributions they make to the overall project if desired.<\/li><\/ul>\n<ul><li> With the basic collection hierarchy now defined, individual datasets can be deposited as and when they emerge from an experiment. We suggest this action is incorporated into the daily <a href=\"https:\/\/www.limswiki.org\/index.php\/Laboratory\" title=\"Laboratory\" target=\"_blank\" class=\"wiki-link\" data-key=\"c57fc5aac9e4abf31dccae81df664c33\">laboratory<\/a> procedures, rather than at the end of any project. For example, when an instrument's data becomes available, the deposit data button from the data repository is used. This requires a title and description as metadata, followed by selection of the data files and finally specifying which collection it is a member of (in this instance the NMR sub-collection <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469\/hpc\/1267\" target=\"_blank\">10.14469\/hpc\/1267<\/a>). Some of the uploaded files can themselves serve to help create descriptive metadata about the data. In this instance for every set of molecular specific NMR data, in either raw spectrometer format (Bruker files as a ZIP archive) or in MestreNova (.mnova) format associated with the analysis software being used<sup id=\"rdp-ebb-cite_ref-Mnova_29-0\" class=\"reference\"><a href=\"#cite_note-Mnova-29\" rel=\"external_link\">[28]<\/a><\/sup>, a separate molecular connection table for that molecule in the form of either a Molfile (.mol) or a Chemdraw file (.cdx or .cdxml) is supplied. If the presence of such a file is detected by the repository workflow scripts, the file itself is passed to OpenBabel<sup id=\"rdp-ebb-cite_ref-OBoyleOpen11_12-1\" class=\"reference\"><a href=\"#cite_note-OBoyleOpen11-12\" rel=\"external_link\">[12]<\/a><\/sup> in order to generate an InChI string and InChI key which will serve as molecular metadata (Fig. 1b). This exposure of metadata we regard as a better approach in principle to the often used alternative of including image representations of the molecular connectivity, which provides no exposed metadata. Other types of metadata generation could be added to our workflows using other types of content. An example of such a deposition has DOI:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469\/hpc\/1291\" target=\"_blank\">10.14469\/hpc\/1291<\/a> for which metadata can again be viewed by pre-pending the resolver <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/data.datacite.org\" target=\"_blank\">https:\/\/data.datacite.org<\/a>.<\/li><\/ul>\n<ul><li> The deposition of computational data occurs by a different mechanism, using the computational ELN we have previously described.<sup id=\"rdp-ebb-cite_ref-HarveyDigital14_17-2\" class=\"reference\"><a href=\"#cite_note-HarveyDigital14-17\" rel=\"external_link\">[17]<\/a><\/sup> This system controls the computational workflow, ending with the option to publish to pre-selected data repositories, one of which is the one being discussed here. Each entry in this ELN is assigned its own project page. When published, this project becomes mapped to a collection of the same name in the data repository and is initially created in a private embargoed state, requiring an access code to view or edit. We use such inherited collections as holding areas in the data repository, since not all entries may turn out to be suitable for inclusion in the final publication-ready collection. The entries in this holding collection can subsequently be edited to become members of the master or sub-collections at the appropriate point prior to e.g. submission of a manuscript to a journal. An example of such a computational deposition is DOI:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469\/hpc\/1312\" target=\"_blank\">10.14469\/hpc\/1312<\/a>. In this case it was re-assigned as a member of the master collection <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469\/hpc\/1116\" target=\"_blank\">10.14469\/hpc\/1116<\/a> rather than the holding collection inherited from the ELN.<\/li><\/ul>\n<ul><li> The final type of dataset was added as a member of <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469\/hpc\/1116\" target=\"_blank\">10.14469\/hpc\/1116<\/a> and is described in more detail below as LiveView below.<\/li><\/ul>\n<h2><span class=\"mw-headline\" id=\"Examples_of_data_exposure\">Examples of data exposure<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"ProView\">ProView<\/span><\/h3>\n<p>Data, especially if originating from instruments or algorithms encoded in software, can be highly complex. The data may be distributed across multiple data files (around 70 for the datasets described below). Some of these may even be binary-encoded with internal structures that can be poorly documented or hidden for proprietary reasons. Here we describe one example for the processing and re-use of such datasets by non-specialists for whom reliable or rich open-source software solutions may not be available and for whom permanent licensed access to the commercial software may not be practical or cost-effective. The datasets in this example originate from commercial NMR spectrometers and require specialist software to convert the data (in the so-called time domain) into visual representations of the data in the frequency domain (\u201cNMR Spectra\u201d). The raw instrumental outputs take the form of a number of separate data time-domain data files, many of which are without even the meta-information of filename extensions. Without the context of the appropriate software such datasets are essentially inaccessible.\n<\/p><p>MestreNova<sup id=\"rdp-ebb-cite_ref-Mnova_29-1\" class=\"reference\"><a href=\"#cite_note-Mnova-29\" rel=\"external_link\">[28]<\/a><\/sup> is commercial software allowing access to such NMR datasets and requires a license entitlement to activate its full feature-set beyond an initial trial period. However, an unlicensed version of MestreNova can have its full function enabled per dataset provided that dataset has been cryptographically signed. These signatures may only be produced by an agency in possession of a MestreNova Publisher license and accompanying signing keys. We have integrated such MestreNova publication into the deposition process of our repository, seamlessly conferring on any NMR dataset deposition the ability to be processed by the MestreNova software. When an NMR dataset in the form of a compressed zip archive or a MestreNova wrapping of such data is deposited into the repository, it is automatically signed, producing a MestreNova-specific \u201cmnpub\u201d-format file which is added to the deposition fileset. This plain-text file contains the URL of the copy of the originating MNova\/ZIP file within the repository, along with the cryptographic signature (Fig. 3). When the mnpub file is loaded into an unlicensed version of MestreNova, the associated resource is loaded from the embedded URL and, provided the cryptographic signature validates, the full features of the software are enabled.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig3_Harvey_JoCheminformatics2017_9.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"3b8c38b7626928056991c3a39eb2fcf9\"><img alt=\"Fig3 Harvey JoCheminformatics2017 9.gif\" src=\"https:\/\/www.limswiki.org\/images\/4\/4c\/Fig3_Harvey_JoCheminformatics2017_9.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 3.<\/b> An example auto-generated mnpub file with components containing the URL of the signed resource, the signature, and the identity of the signing entity being the cryptographic key associated with the MestreNova publisher license granted to the repository <\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p>We believe this feature demonstrates a powerful incentive for using the repository. By enabling the use of custom software on submitted datasets, the repository becomes more than a passive silo for data, actively enabling depositors and viewers to interact with datasets in a rich, domain-specific way. Furthermore, it is accomplished without the need to develop format-specific enhancements into the repository itself.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"LiveView\">LiveView<\/span><\/h3>\n<p>The most generic solution to data preview is the deployment of HTML in conjunction with appropriated visualization routines. One example of this can be illustrated with the following components.\n<\/p>\n<ul><li> The HTML document is assigned the reserved name <tt>index.html<\/tt>.<\/li><\/ul>\n<ul><li> When a document with this name is deposited, it is automatically transcluded into the landing page of the deposition using an HTML iframe: <tt><iframe name=\u201cliveview\u201d src=\u201c\/resolve\/?doi=1248&file=13&access=\u201c width=\u201c100%\u201d height=\u201c600\u201d><\/iframe><\/tt>, where the string <tt>doi=1248&file=13&access=<\/tt> references the appropriate database entry for the object assigned the DOI:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469\/hpc\/1248\" target=\"_blank\">10.14469\/hpc\/1248<\/a>.<\/li><\/ul>\n<ul><li> The preview functionality is then enabled by author-specified inclusion of JavaScript containing utility functions hosted on the repository into the <tt>index.html<\/tt> document: <tt><script src=\u201c https:\/\/data.hpc.imperial.ac.uk\/js\/utilities.js \u201c><\/tt>. These serve to invoke an open-source molecular visualiser JSmol<sup id=\"rdp-ebb-cite_ref-HansonJSmol13_30-0\" class=\"reference\"><a href=\"#cite_note-HansonJSmol13-30\" rel=\"external_link\">[29]<\/a><\/sup> which as the name implies is based purely on JavaScript.<\/li><\/ul>\n<ul><li> A further script is loaded at this stage, along with a formatting stylesheet: <tt><script> insertFile(\u201cresolve-doi.js\u201d); insertFile(\u201ctable.css\u201d); <\/script><\/tt>. The resolve-doi.js script invokes procedures which accept a dataset DOI as input. Querying the metadata associated with that DOI using the form <tt><a rel=\"external_link\" class=\"external free\" href=\"http:\/\/data.datacite.org\/10.14469\/ch\/192018\" target=\"_blank\">http:\/\/data.datacite.org\/10.14469\/ch\/192018<\/a><\/tt> allows the path to the ORE or METS resource manifests to be identified (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/spectradspace.lib.imperial.ac.uk:8443\/metadata\/handle\/10042\/196268\/ore.xml\" target=\"_blank\">https:\/\/spectradspace.lib.imperial.ac.uk:8443\/metadata\/handle\/10042\/196268\/ore.xml<\/a> in this instance) and parsing of this manifest then allows the direct path to the data to be extracted and passed through to the JSmol visualization script.<\/li><\/ul>\n<ul><li> An author-initiated entry of the following type in the <tt>index.html<\/tt> document then conflates these various actions, the result being a live view of the retrieved dataset in the browser window (Fig. 4): <tt>javascript:handle_jmol(\u201810.14469\/ch\/192018\u2019,\u2019;display script;\u2019)\u201d>anchored text<\/a><\/tt><\/li><\/ul>\n<p><br \/>\n<\/p>\n<dl><dd><a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig4_Harvey_JoCheminformatics2017_9.gif\" class=\"image wiki-link\" target=\"_blank\" data-key=\"6ef22e43243d2abf39c187146a4361b8\"><img alt=\"Fig4 Harvey JoCheminformatics2017 9.gif\" src=\"https:\/\/www.limswiki.org\/images\/8\/8c\/Fig4_Harvey_JoCheminformatics2017_9.gif\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a><\/dd><\/dl>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 4.<\/b> Liveview<sup id=\"rdp-ebb-cite_ref-RzepaFAIR16_31-0\" class=\"reference\"><a href=\"#cite_note-RzepaFAIR16-31\" rel=\"external_link\">[30]<\/a><\/sup> of a dataset collection expressed using HTML and integrated visualization package with data retrieved by script-driven DOI-based data retrieval<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<ul><li> The document <tt>index.html<\/tt> is itself available for download from the repository and hence can act as a useful template for other authors.<\/li><\/ul>\n<p>The metadata-driven approach described here is directed towards enabling the FAIR<sup id=\"rdp-ebb-cite_ref-WilkinsonTheFAIR16_8-1\" class=\"reference\"><a href=\"#cite_note-WilkinsonTheFAIR16-8\" rel=\"external_link\">[8]<\/a><\/sup> attributes of the data. Registering rich metadata associated with, for example, a primary research object; in molecular sciences the molecule, in turn allows rich searches (the F of FAIR) to be constructed using the generic resources available at the metadata aggregator, DataCite. This approach makes for an interesting contrast with that adopted by the publisher Elsevier<sup id=\"rdp-ebb-cite_ref-DS_32-0\" class=\"reference\"><a href=\"#cite_note-DS-32\" rel=\"external_link\">[31]<\/a><\/sup> for their recently introduced DataSearch site. This initially appears to be based on the content provided by their journal base, ScienceDirect. The metadata here is simply the likely (but not guaranteed) presence of data within containers such as images or tables. In this approach, a user-driven data search culminates in the user being directed to the data source, which is in fact the article itself as an object. It is then very much up to the user to identify the data of interest within the article, whether in the text or images of the article itself or any associated supporting information. Such an approach is clearly not based on metadata describing data objects such as molecular entities and leaves the burden on the user to identify and extract any such information themselves. It would be itself fair to suggest that such a process does not fully adhere to the principles of FAIR data.\n<\/p><p>A clear emerging trend is that journal publication is starting to be associated with procedures for identifying associated data as a primary research object in its own right. The extent to which such data is rendered fully open, in the sense of being compliant with all of the FAIR principles, remains uncertain. It seems likely that journal publishers, who will retain full control over the complete workflows involving data, may not necessarily wish to expose the data as openly FAIR or at the granularity which may be most useful to the researcher. Here we have outlined an alternative metadata-driven mechanism for achieving finely-grained FAIR data exposures in association with journal publication which can be utilized by authors themselves as the creators of research data, and where authors can retain control over the type of metadata captured. In this alternative model, open FAIR data is published at the research institutional level and the associated metadata aggregated at the global level by agencies such as DataCite without a need for intervention by journal-publisher workflows.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Declarations\">Declarations<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Authors.27_contributions\">Authors' contributions<\/span><\/h3>\n<p>All authors contributed to this work. All authors read and approved the final manuscript.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Acknowledgements\">Acknowledgements<\/span><\/h4>\n<p>We are grateful to MestreLab Research for providing instructions for signing their datasets and for countersigning the public key for the repository described here.\n<\/p>\n<h4><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h4>\n<p>The authors declare that they have no competing interests.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Footnotes\">Footnotes<\/span><\/h2>\n<div class=\"reflist\" style=\"list-style-type: lower-alpha;\">\n<ol class=\"references\">\n<li id=\"cite_note-26\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-26\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\">Available for download at <span class=\"url\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/data.datacite.org\/application\/x-datacite+xml\/10.14469\/hpc\/1280\" target=\"_blank\">data.datacite.org\/application\/x-datacite+xml\/10.14469\/hpc\/1280<\/a><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-DSpace-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DSpace_1-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.dspace.org\/\" target=\"_blank\">\"DSpace\"<\/a>. DuraSpace Organization<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.dspace.org\/\" target=\"_blank\">http:\/\/www.dspace.org\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=DSpace&rft.atitle=&rft.pub=DuraSpace+Organization&rft_id=http%3A%2F%2Fwww.dspace.org%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Figshare-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Figshare_2-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/figshare.com\/\" target=\"_blank\">\"Figshare\"<\/a>. Figshare LLP<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/figshare.com\/\" target=\"_blank\">https:\/\/figshare.com\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Figshare&rft.atitle=&rft.pub=Figshare+LLP&rft_id=https%3A%2F%2Ffigshare.com%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Zenodo-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-Zenodo_3-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/zenodo.org\/\" target=\"_blank\">\"Zenodo\"<\/a>. CERN Data Centre<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/zenodo.org\/\" target=\"_blank\">https:\/\/zenodo.org\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Zenodo&rft.atitle=&rft.pub=CERN+Data+Centre&rft_id=https%3A%2F%2Fzenodo.org%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DataCite-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DataCite_4-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.datacite.org\/\" target=\"_blank\">\"DataCite\"<\/a>. DataCite Association<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.datacite.org\/\" target=\"_blank\">https:\/\/www.datacite.org\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=DataCite&rft.atitle=&rft.pub=DataCite+Association&rft_id=https%3A%2F%2Fwww.datacite.org%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DowningSPECTRa08-5\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-DowningSPECTRa08_5-0\" rel=\"external_link\">5.0<\/a><\/sup> <sup><a href=\"#cite_ref-DowningSPECTRa08_5-1\" rel=\"external_link\">5.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Downing, J.; Murray-Rust, P.; Tonge, A.P. et al. (2008). \"SPECTRa: The deposition and validation of primary chemistry research data in digital repositories\". <i>Journal of Chemical Information and Modeling<\/i> <b>48<\/b> (8): 1571\u20131581. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1021%2Fci7004737\" target=\"_blank\">10.1021\/ci7004737<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SPECTRa%3A+The+deposition+and+validation+of+primary+chemistry+research+data+in+digital+repositories&rft.jtitle=Journal+of+Chemical+Information+and+Modeling&rft.aulast=Downing%2C+J.%3B+Murray-Rust%2C+P.%3B+Tonge%2C+A.P.+et+al.&rft.au=Downing%2C+J.%3B+Murray-Rust%2C+P.%3B+Tonge%2C+A.P.+et+al.&rft.date=2008&rft.volume=48&rft.issue=8&rft.pages=1571%E2%80%931581&rft_id=info:doi\/10.1021%2Fci7004737&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HarveyStandards15-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HarveyStandards15_6-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Harvey, M.J.; Mason, N.J.; McLean, A. et al. (2015). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4550659\" target=\"_blank\">\"Standards-based curation of a decade-old digital repository dataset of molecular information\"<\/a>. <i>Journal of Cheminformatics<\/i> <b>7<\/b>: 43. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1186%2Fs13321-015-0093-3\" target=\"_blank\">10.1186\/s13321-015-0093-3<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4550659\/\" target=\"_blank\">PMC4550659<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26322133\" target=\"_blank\">26322133<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4550659\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4550659<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Standards-based+curation+of+a+decade-old+digital+repository+dataset+of+molecular+information&rft.jtitle=Journal+of+Cheminformatics&rft.aulast=Harvey%2C+M.J.%3B+Mason%2C+N.J.%3B+McLean%2C+A.+et+al.&rft.au=Harvey%2C+M.J.%3B+Mason%2C+N.J.%3B+McLean%2C+A.+et+al.&rft.date=2015&rft.volume=7&rft.pages=43&rft_id=info:doi\/10.1186%2Fs13321-015-0093-3&rft_id=info:pmc\/PMC4550659&rft_id=info:pmid\/26322133&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4550659&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RzepaInChl16-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RzepaInChl16_7-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Rzepa, H.S.; Mclean, A.; Harvey, M.J. (2015). \"InChI as a research data management tool\". <i>Chemistry International<\/i> <b>38<\/b> (3\u20134): 24\u201326. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1515%2Fci-2016-3-408\" target=\"_blank\">10.1515\/ci-2016-3-408<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=InChI+as+a+research+data+management+tool&rft.jtitle=Chemistry+International&rft.aulast=Rzepa%2C+H.S.%3B+Mclean%2C+A.%3B+Harvey%2C+M.J.&rft.au=Rzepa%2C+H.S.%3B+Mclean%2C+A.%3B+Harvey%2C+M.J.&rft.date=2015&rft.volume=38&rft.issue=3%E2%80%934&rft.pages=24%E2%80%9326&rft_id=info:doi\/10.1515%2Fci-2016-3-408&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WilkinsonTheFAIR16-8\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-WilkinsonTheFAIR16_8-0\" rel=\"external_link\">8.0<\/a><\/sup> <sup><a href=\"#cite_ref-WilkinsonTheFAIR16_8-1\" rel=\"external_link\">8.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4792175\" target=\"_blank\">\"The FAIR Guiding Principles for scientific data management and stewardship\"<\/a>. <i>Scientific Data<\/i> <b>3<\/b>: 160018. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fsdata.2016.18\" target=\"_blank\">10.1038\/sdata.2016.18<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4792175\/\" target=\"_blank\">PMC4792175<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26978244\" target=\"_blank\">26978244<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4792175\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4792175<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+FAIR+Guiding+Principles+for+scientific+data+management+and+stewardship&rft.jtitle=Scientific+Data&rft.aulast=Wilkinson%2C+M.D.%3B+Dumontier%2C+M.%3B+Aalbersberg%2C+I.J.+et+al.&rft.au=Wilkinson%2C+M.D.%3B+Dumontier%2C+M.%3B+Aalbersberg%2C+I.J.+et+al.&rft.date=2016&rft.volume=3&rft.pages=160018&rft_id=info:doi\/10.1038%2Fsdata.2016.18&rft_id=info:pmc\/PMC4792175&rft_id=info:pmid\/26978244&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4792175&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DC10.14469ch153690-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DC10.14469ch153690_9-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/data.datacite.org\/10.14469\/ch\/153690\" target=\"_blank\">\"doi:10.14469\/ch\/153690\"<\/a>. <i>DataCite Content Service Beta<\/i>. DataCite Association<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/data.datacite.org\/10.14469\/ch\/153690\" target=\"_blank\">https:\/\/data.datacite.org\/10.14469\/ch\/153690<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=doi%3A10.14469%2Fch%2F153690&rft.atitle=DataCite+Content+Service+Beta&rft.pub=DataCite+Association&rft_id=https%3A%2F%2Fdata.datacite.org%2F10.14469%2Fch%2F153690&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DC10.14469ch2-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DC10.14469ch2_10-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/data.datacite.org\/10.14469\/ch\/2\" target=\"_blank\">\"doi:10.14469\/ch\/2\"<\/a>. <i>DataCite Content Service Beta<\/i>. DataCite Association<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/data.datacite.org\/10.14469\/ch\/2\" target=\"_blank\">https:\/\/data.datacite.org\/10.14469\/ch\/2<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=doi%3A10.14469%2Fch%2F2&rft.atitle=DataCite+Content+Service+Beta&rft.pub=DataCite+Association&rft_id=https%3A%2F%2Fdata.datacite.org%2F10.14469%2Fch%2F2&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-InChI-11\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-InChI_11-0\" rel=\"external_link\">11.0<\/a><\/sup> <sup><a href=\"#cite_ref-InChI_11-1\" rel=\"external_link\">11.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.inchi-trust.org\/\" target=\"_blank\">\"InChI Trust\"<\/a>. InChI Trust<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.inchi-trust.org\/\" target=\"_blank\">http:\/\/www.inchi-trust.org\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=InChI+Trust&rft.atitle=&rft.pub=InChI+Trust&rft_id=http%3A%2F%2Fwww.inchi-trust.org%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-OBoyleOpen11-12\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-OBoyleOpen11_12-0\" rel=\"external_link\">12.0<\/a><\/sup> <sup><a href=\"#cite_ref-OBoyleOpen11_12-1\" rel=\"external_link\">12.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">O'Boyle, N.M.; Banck, M.; James, C.A. et al. (2011). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3198950\" target=\"_blank\">\"Open Babel: An open chemical toolbox\"<\/a>. <i>Journal of Cheminformatics<\/i> <b>3<\/b>: 33. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1186%2F1758-2946-3-33\" target=\"_blank\">10.1186\/1758-2946-3-33<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3198950\/\" target=\"_blank\">PMC3198950<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21982300\" target=\"_blank\">21982300<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3198950\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3198950<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Open+Babel%3A+An+open+chemical+toolbox&rft.jtitle=Journal+of+Cheminformatics&rft.aulast=O%27Boyle%2C+N.M.%3B+Banck%2C+M.%3B+James%2C+C.A.+et+al.&rft.au=O%27Boyle%2C+N.M.%3B+Banck%2C+M.%3B+James%2C+C.A.+et+al.&rft.date=2011&rft.volume=3&rft.pages=33&rft_id=info:doi\/10.1186%2F1758-2946-3-33&rft_id=info:pmc\/PMC3198950&rft_id=info:pmid\/21982300&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3198950&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-InChI.js-13\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-InChI.js_13-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/metamolecular.com\/inchi-js\/\" target=\"_blank\">\"InChI for the Web Browser with InChI.js\"<\/a>. Metamolecular, LLC<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/metamolecular.com\/inchi-js\/\" target=\"_blank\">https:\/\/metamolecular.com\/inchi-js\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=InChI+for+the+Web+Browser+with+InChI.js&rft.atitle=&rft.pub=Metamolecular%2C+LLC&rft_id=https%3A%2F%2Fmetamolecular.com%2Finchi-js%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ORE-14\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-ORE_14-0\" rel=\"external_link\">14.0<\/a><\/sup> <sup><a href=\"#cite_ref-ORE_14-1\" rel=\"external_link\">14.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.openarchives.org\/ore\/1.0\/datamodel\" target=\"_blank\">\"ORE Specification - Abstract Data Model\"<\/a>. Open Archives Initiative<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.openarchives.org\/ore\/1.0\/datamodel\" target=\"_blank\">http:\/\/www.openarchives.org\/ore\/1.0\/datamodel<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=ORE+Specification+-+Abstract+Data+Model&rft.atitle=&rft.pub=Open+Archives+Initiative&rft_id=http%3A%2F%2Fwww.openarchives.org%2Fore%2F1.0%2Fdatamodel&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HarveyStand15-15\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-HarveyStand15_15-0\" rel=\"external_link\">15.0<\/a><\/sup> <sup><a href=\"#cite_ref-HarveyStand15_15-1\" rel=\"external_link\">15.1<\/a><\/sup> <sup><a href=\"#cite_ref-HarveyStand15_15-2\" rel=\"external_link\">15.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Harvey, M.J.; Mason, N.J.; McLean, A.; Rzepa, H.S. (2015). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4528360\" target=\"_blank\">\"Standards-based metadata procedures for retrieving data for display or mining utilizing persistent (data-DOI) identifiers\"<\/a>. <i>Journal of Cheminformatics<\/i> <b>7<\/b>: 37. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1186%2Fs13321-015-0081-7\" target=\"_blank\">10.1186\/s13321-015-0081-7<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4528360\/\" target=\"_blank\">PMC4528360<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26257829\" target=\"_blank\">26257829<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4528360\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4528360<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Standards-based+metadata+procedures+for+retrieving+data+for+display+or+mining+utilizing+persistent+%28data-DOI%29+identifiers&rft.jtitle=Journal+of+Cheminformatics&rft.aulast=Harvey%2C+M.J.%3B+Mason%2C+N.J.%3B+McLean%2C+A.%3B+Rzepa%2C+H.S.&rft.au=Harvey%2C+M.J.%3B+Mason%2C+N.J.%3B+McLean%2C+A.%3B+Rzepa%2C+H.S.&rft.date=2015&rft.volume=7&rft.pages=37&rft_id=info:doi\/10.1186%2Fs13321-015-0081-7&rft_id=info:pmc\/PMC4528360&rft_id=info:pmid\/26257829&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4528360&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ConNeg-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ConNeg_16-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/citation.crosscite.org\/docs.html\" target=\"_blank\">\"DOI Content Negotiation\"<\/a>. <i>DOI Citation Formatter<\/i><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/citation.crosscite.org\/docs.html\" target=\"_blank\">http:\/\/citation.crosscite.org\/docs.html<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=DOI+Content+Negotiation&rft.atitle=DOI+Citation+Formatter&rft_id=http%3A%2F%2Fcitation.crosscite.org%2Fdocs.html&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HarveyDigital14-17\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-HarveyDigital14_17-0\" rel=\"external_link\">17.0<\/a><\/sup> <sup><a href=\"#cite_ref-HarveyDigital14_17-1\" rel=\"external_link\">17.1<\/a><\/sup> <sup><a href=\"#cite_ref-HarveyDigital14_17-2\" rel=\"external_link\">17.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Harvey, M.J.; Mason, N.J.; Rzepa, H.S. (2014). \"Digital data repositories in chemistry and their integration with journals and electronic laboratory notebooks\". <i>Journal of Chemical Information and Modeling<\/i> <b>54<\/b> (10): 2627\u20132635. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1021%2Fci500302p\" target=\"_blank\">10.1021\/ci500302p<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Digital+data+repositories+in+chemistry+and+their+integration+with+journals+and+electronic+laboratory+notebooks&rft.jtitle=Journal+of+Chemical+Information+and+Modeling&rft.aulast=Harvey%2C+M.J.%3B+Mason%2C+N.J.%3B+Rzepa%2C+H.S.&rft.au=Harvey%2C+M.J.%3B+Mason%2C+N.J.%3B+Rzepa%2C+H.S.&rft.date=2014&rft.volume=54&rft.issue=10&rft.pages=2627%E2%80%932635&rft_id=info:doi\/10.1021%2Fci500302p&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RE3-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RE3_18-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.re3data.org\/about\" target=\"_blank\">\"About\"<\/a>. <i>Registry of Research Data Repositories<\/i>. Karlsruhe Institute of Technology<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.re3data.org\/about\" target=\"_blank\">http:\/\/www.re3data.org\/about<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=About&rft.atitle=Registry+of+Research+Data+Repositories&rft.pub=Karlsruhe+Institute+of+Technology&rft_id=http%3A%2F%2Fwww.re3data.org%2Fabout&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ImperialCollege-19\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-ImperialCollege_19-0\" rel=\"external_link\">19.0<\/a><\/sup> <sup><a href=\"#cite_ref-ImperialCollege_19-1\" rel=\"external_link\">19.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.re3data.org\/repository\/r3d100011965\" target=\"_blank\">\"Imperial College High Performance Computing Service Data Repository\"<\/a>. <i>Registry of Research Data Repositories<\/i>. Karlsruhe Institute of Technology. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.17616%2FR3K64N\" target=\"_blank\">10.17616\/R3K64N<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.re3data.org\/repository\/r3d100011965\" target=\"_blank\">http:\/\/www.re3data.org\/repository\/r3d100011965<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Imperial+College+High+Performance+Computing+Service+Data+Repository&rft.atitle=Registry+of+Research+Data+Repositories&rft.pub=Karlsruhe+Institute+of+Technology&rft_id=info:doi\/10.17616%2FR3K64N&rft_id=http%3A%2F%2Fwww.re3data.org%2Frepository%2Fr3d100011965&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-XMLReg-20\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-XMLReg_20-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1369\" target=\"_blank\">\"XML registration with re3data\"<\/a>. <i>Imperial College High Performance Computing Service Data Repository<\/i>. Imperial College London. 07 September 2016. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469%2Fhpc%2F1369\" target=\"_blank\">10.14469\/hpc\/1369<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1369\" target=\"_blank\">https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1369<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=XML+registration+with+re3data&rft.atitle=Imperial+College+High+Performance+Computing+Service+Data+Repository&rft.date=07+September+2016&rft.pub=Imperial+College+London&rft_id=info:doi\/10.14469%2Fhpc%2F1369&rft_id=https%3A%2F%2Fdata.hpc.imperial.ac.uk%2Fresolve%2F%3Fdoi%3D1369&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DRP-21\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DRP_21-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Rzepa, H.; McLean, A.; Harvey, M.J. (25 July 2016). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1088\" target=\"_blank\">\"Data Repository Project\"<\/a>. <i>Imperial College High Performance Computing Service Data Repository<\/i>. Imperial College London. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469%2Fhpc%2F1088\" target=\"_blank\">10.14469\/hpc\/1088<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1088\" target=\"_blank\">https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1088<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Data+Repository+Project&rft.atitle=Imperial+College+High+Performance+Computing+Service+Data+Repository&rft.aulast=Rzepa%2C+H.%3B+McLean%2C+A.%3B+Harvey%2C+M.J.&rft.au=Rzepa%2C+H.%3B+McLean%2C+A.%3B+Harvey%2C+M.J.&rft.date=25+July+2016&rft.pub=Imperial+College+London&rft_id=info:doi\/10.14469%2Fhpc%2F1088&rft_id=https%3A%2F%2Fdata.hpc.imperial.ac.uk%2Fresolve%2F%3Fdoi%3D1088&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RicherUser-22\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-RicherUser_22-0\" rel=\"external_link\">22.0<\/a><\/sup> <sup><a href=\"#cite_ref-RicherUser_22-1\" rel=\"external_link\">22.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\">Richer, J.. <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/oauth.net\/articles\/authentication\/\" target=\"_blank\">\"User Authentication with OAuth 2.0\"<\/a>. <i>OAuth.net<\/i><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/oauth.net\/articles\/authentication\/\" target=\"_blank\">https:\/\/oauth.net\/articles\/authentication\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=User+Authentication+with+OAuth+2.0&rft.atitle=OAuth.net&rft.aulast=Richer%2C+J.&rft.au=Richer%2C+J.&rft_id=https%3A%2F%2Foauth.net%2Farticles%2Fauthentication%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DMWG-23\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DMWG_23-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/schema.datacite.org\/meta\/kernel-4.0\/\" target=\"_blank\">\"DataCite Metadata Schema 4.0\"<\/a>. DataCite Metadata Working Group. 19 September 2016<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/schema.datacite.org\/meta\/kernel-4.0\/\" target=\"_blank\">http:\/\/schema.datacite.org\/meta\/kernel-4.0\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 16 January 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=DataCite+Metadata+Schema+4.0&rft.atitle=&rft.date=19+September+2016&rft.pub=DataCite+Metadata+Working+Group&rft_id=http%3A%2F%2Fschema.datacite.org%2Fmeta%2Fkernel-4.0%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GHWebhooks-24\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GHWebhooks_24-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/developer.github.com\/webhooks\/\" target=\"_blank\">\"Webhooks\"<\/a>. <i>API<\/i>. GitHub, Inc<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/developer.github.com\/webhooks\/\" target=\"_blank\">https:\/\/developer.github.com\/webhooks\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Webhooks&rft.atitle=API&rft.pub=GitHub%2C+Inc&rft_id=https%3A%2F%2Fdeveloper.github.com%2Fwebhooks%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HPCRepo-25\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HPCRepo_25-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Harvey, M.J.. <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/github.com\/ICHPC\/hpc-repo\" target=\"_blank\">\"ICHPC\/hpc-repo\"<\/a>. GitHub, Inc. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469%2Fhpc%2F1487\" target=\"_blank\">10.14469\/hpc\/1487<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/github.com\/ICHPC\/hpc-repo\" target=\"_blank\">https:\/\/github.com\/ICHPC\/hpc-repo<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=ICHPC%2Fhpc-repo&rft.atitle=&rft.aulast=Harvey%2C+M.J.&rft.au=Harvey%2C+M.J.&rft.pub=GitHub%2C+Inc&rft_id=info:doi\/10.14469%2Fhpc%2F1487&rft_id=https%3A%2F%2Fgithub.com%2FICHPC%2Fhpc-repo&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RzepaEpimeric16-27\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RzepaEpimeric16_27-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Rzepa, H.; White, A.; Braddock, D.C. et al. (26 July 2016). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1116\" target=\"_blank\">\"Epimeric Face-Selective Oxidations and Diastereodivergent Transannular Oxonium Ion Formation-Fragmentations: Computational Modelling and Total Syntheses of 12-Epoxyobtusallene IV, 12-Epoxyobtusallene II, Obtusallene X, Marilzabicycloallene C and Marilzabicycloallene D\"<\/a>. <i>Imperial College High Performance Computing Service Data Repository<\/i>. Imperial College London. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469%2Fhpc%2F1116\" target=\"_blank\">10.14469\/hpc\/1116<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1116\" target=\"_blank\">https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1116<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Epimeric+Face-Selective+Oxidations+and+Diastereodivergent+Transannular+Oxonium+Ion+Formation-Fragmentations%3A+Computational+Modelling+and+Total+Syntheses+of+12-Epoxyobtusallene+IV%2C+12-Epoxyobtusallene+II%2C+Obtusallene+X%2C+Marilzabicycloallene+C+and+Marilzabicycloallene+D&rft.atitle=Imperial+College+High+Performance+Computing+Service+Data+Repository&rft.aulast=Rzepa%2C+H.%3B+White%2C+A.%3B+Braddock%2C+D.C.+et+al.&rft.au=Rzepa%2C+H.%3B+White%2C+A.%3B+Braddock%2C+D.C.+et+al.&rft.date=26+July+2016&rft.pub=Imperial+College+London&rft_id=info:doi\/10.14469%2Fhpc%2F1116&rft_id=https%3A%2F%2Fdata.hpc.imperial.ac.uk%2Fresolve%2F%3Fdoi%3D1116&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ClarkeEpimeric16-28\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ClarkeEpimeric16_28-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Clarke, J.; Bonney, K.J.; Yaqoob, M. et al. (2016). \"Epimeric Face-Selective Oxidations and Diastereodivergent Transannular Oxonium Ion Formation Fragmentations: Computational Modeling and Total Syntheses of 12-Epoxyobtusallene IV, 12-Epoxyobtusallene II, Obtusallene X, Marilzabicycloallene C, and Marilzabicycloallene D\". <i>Journal of Organic Chemistry<\/i> <b>81<\/b> (20): 9539\u20139552. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1021%2Facs.joc.6b02008\" target=\"_blank\">10.1021\/acs.joc.6b02008<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Epimeric+Face-Selective+Oxidations+and+Diastereodivergent+Transannular+Oxonium+Ion+Formation+Fragmentations%3A+Computational+Modeling+and+Total+Syntheses+of+12-Epoxyobtusallene+IV%2C+12-Epoxyobtusallene+II%2C+Obtusallene+X%2C+Marilzabicycloallene+C%2C+and+Marilzabicycloallene+D&rft.jtitle=Journal+of+Organic+Chemistry&rft.aulast=Clarke%2C+J.%3B+Bonney%2C+K.J.%3B+Yaqoob%2C+M.+et+al.&rft.au=Clarke%2C+J.%3B+Bonney%2C+K.J.%3B+Yaqoob%2C+M.+et+al.&rft.date=2016&rft.volume=81&rft.issue=20&rft.pages=9539%E2%80%939552&rft_id=info:doi\/10.1021%2Facs.joc.6b02008&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Mnova-29\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-Mnova_29-0\" rel=\"external_link\">28.0<\/a><\/sup> <sup><a href=\"#cite_ref-Mnova_29-1\" rel=\"external_link\">28.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/mestrelab.com\/software\/mnova\/\" target=\"_blank\">\"Mnova\"<\/a>. Mestrelab Research, S.L<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/mestrelab.com\/software\/mnova\/\" target=\"_blank\">http:\/\/mestrelab.com\/software\/mnova\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Mnova&rft.atitle=&rft.pub=Mestrelab+Research%2C+S.L&rft_id=http%3A%2F%2Fmestrelab.com%2Fsoftware%2Fmnova%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HansonJSmol13-30\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HansonJSmol13_30-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Hanson, R.M.; Prilusky, J.; Renjian, Z. et al. (2013). \"JSmol and the Next-Generation Web-Based Representation of 3D Molecular Structure as Applied to <i>Proteopedia<\/i>\". <i>Israel Journal of Chemistry<\/i> <b>53<\/b> (3-4): 207\u2013216. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1002%2Fijch.201300024\" target=\"_blank\">10.1002\/ijch.201300024<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=JSmol+and+the+Next-Generation+Web-Based+Representation+of+3D+Molecular+Structure+as+Applied+to+%27%27Proteopedia%27%27&rft.jtitle=Israel+Journal+of+Chemistry&rft.aulast=Hanson%2C+R.M.%3B+Prilusky%2C+J.%3B+Renjian%2C+Z.+et+al.&rft.au=Hanson%2C+R.M.%3B+Prilusky%2C+J.%3B+Renjian%2C+Z.+et+al.&rft.date=2013&rft.volume=53&rft.issue=3-4&rft.pages=207%E2%80%93216&rft_id=info:doi\/10.1002%2Fijch.201300024&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-RzepaFAIR16-31\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-RzepaFAIR16_31-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\">Rzepa, H.; White, A.; Braddock, D.C. et al. (10 August 2016). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1248\" target=\"_blank\">\"FAIR Data table. Computed relative reaction free energies (kcal\/mol-1) of Obtusallene derived oxonium and chloronium cations\"<\/a>. <i>Imperial College High Performance Computing Service Data Repository<\/i>. Imperial College London. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.14469%2Fhpc%2F1248\" target=\"_blank\">10.14469\/hpc\/1248<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1248\" target=\"_blank\">https:\/\/data.hpc.imperial.ac.uk\/resolve\/?doi=1248<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 November 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=FAIR+Data+table.+Computed+relative+reaction+free+energies+%28kcal%2Fmol-1%29+of+Obtusallene+derived+oxonium+and+chloronium+cations&rft.atitle=Imperial+College+High+Performance+Computing+Service+Data+Repository&rft.aulast=Rzepa%2C+H.%3B+White%2C+A.%3B+Braddock%2C+D.C.+et+al.&rft.au=Rzepa%2C+H.%3B+White%2C+A.%3B+Braddock%2C+D.C.+et+al.&rft.date=10+August+2016&rft.pub=Imperial+College+London&rft_id=info:doi\/10.14469%2Fhpc%2F1248&rft_id=https%3A%2F%2Fdata.hpc.imperial.ac.uk%2Fresolve%2F%3Fdoi%3D1248&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DS-32\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DS_32-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"https:\/\/datasearch.elsevier.com\/\" target=\"_blank\">\"DataSearch\"<\/a>. Elsevier B.V<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/datasearch.elsevier.com\/\" target=\"_blank\">https:\/\/datasearch.elsevier.com\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 07 September 2017<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=DataSearch&rft.atitle=&rft.pub=Elsevier+B.V&rft_id=https%3A%2F%2Fdatasearch.elsevier.com%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:A_metadata-driven_approach_to_data_repository_design\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. In one case, the original citation was incomplete (#6) and was corrected here. What was originally reference 26, a link to a downloadable file, was turned into a footnote for clarity.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20181214191057\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.717 seconds\nReal time usage: 0.758 seconds\nPreprocessor visited node count: 22315\/1000000\nPreprocessor generated node count: 35380\/1000000\nPost\u2010expand include size: 152154\/2097152 bytes\nTemplate argument size: 51754\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 702.977 1 - -total\n 81.63% 573.868 2 - Template:Reflist\n 71.31% 501.327 31 - Template:Citation\/core\n 49.73% 349.571 22 - Template:Cite_web\n 25.55% 179.600 9 - Template:Cite_journal\n 10.23% 71.885 1 - Template:Infobox_journal_article\n 9.82% 69.045 1 - Template:Infobox\n 5.71% 40.170 80 - Template:Infobox\/row\n 5.26% 36.998 23 - Template:Citation\/identifier\n 4.43% 31.143 42 - Template:Citation\/make_link\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:9945-0!*!0!!en!5!* and timestamp 20181214191056 and revision id 29332\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:A_metadata-driven_approach_to_data_repository_design\">https:\/\/www.limswiki.org\/index.php\/Journal:A_metadata-driven_approach_to_data_repository_design<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","0c7c45ef71cf479715ea32203f1e26d3_images":["https:\/\/www.limswiki.org\/images\/5\/5f\/Fig1_Harvey_JoCheminformatics2017_9.gif","https:\/\/www.limswiki.org\/images\/5\/54\/Fig1.5_Harvey_JoCheminformatics2017_9.gif","https:\/\/www.limswiki.org\/images\/b\/b0\/Fig2_Harvey_JoCheminformatics2017_9.gif","https:\/\/www.limswiki.org\/images\/4\/4c\/Fig3_Harvey_JoCheminformatics2017_9.gif","https:\/\/www.limswiki.org\/images\/8\/8c\/Fig4_Harvey_JoCheminformatics2017_9.gif"],"0c7c45ef71cf479715ea32203f1e26d3_timestamp":1544814656,"0efba51aeff20a2591887ad29fac5866_type":"article","0efba51aeff20a2591887ad29fac5866_title":"Ten simple rules to enable multi-site collaborations through data sharing (Boland et al. 2017)","0efba51aeff20a2591887ad29fac5866_url":"https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing","0efba51aeff20a2591887ad29fac5866_plaintext":"\n\n\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\n\t\t\t\tJournal:Ten simple rules to enable multi-site collaborations through data sharing\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\tFrom LIMSWiki\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tJump to: navigation, search\n\n\t\t\t\t\t\n\t\t\t\t\tFull article title\n \nTen simple rules to enable multi-site collaborations through data sharingJournal\n \nPLOS Computational BiologyAuthor(s)\n \nBoland, Mary Regina; Karczewski, Konrad J.; Tatonetti, Nicholas P.Author affiliation(s)\n \nColumbia University (NY), Broad Institute of MIT and Harvard, Massachusetts General HospitalPrimary contact\n \nEmail: mary dot boland @ columbia dot eduYear published\n \n2017Volume and issue\n \n13(1)Page(s)\n \ne1005278DOI\n \n10.1371\/journal.pcbi.1005278ISSN\n \n1553-7358Distribution license\n \nCreative Commons Attribution 4.0 InternationalWebsite\n \nhttp:\/\/journals.plos.org\/ploscompbiol\/article?id=10.1371\/journal.pcbi.1005278Download\n \nhttp:\/\/journals.plos.org\/ploscompbiol\/article\/file?id=10.1371\/journal.pcbi.1005278&type=printable (PDF)\n\nContents\n\n1 Introduction \n2 Definitions \n3 Rule 1: Make software open-source \n4 Rule 2: Provide open-source data \n\n4.1 Deposit source data in appropriate repositories \n4.2 Consider middle-ground data sharing approaches for sensitive data \n\n\n5 Rule 3: Use multiple platforms to share research products \n6 Rule 4: Secure necessary permissions\/data use agreements a priori \n7 Rule 5: Know the privacy rules for your data \n8 Rule 6: Facilitate reproducibility \n9 Rule 7: Think global \n10 Rule 8: Publicize your work \n11 Rule 9: Stay realistic, but aim high \n12 Rule 10: Be engaged \n13 Concluding remarks \n14 Funding \n15 Competing interests \n16 References \n17 Notes \n\n\n\nIntroduction \nOpen access, open data, and software are critical for advancing science and enabling collaboration across multiple institutions and throughout the world. Despite near universal recognition of its importance, major barriers still exist to sharing raw data, software, and research products throughout the scientific community. Many of these barriers vary by specialty[1], increasing the difficulties for interdisciplinary and\/or translational researchers to engage in collaborative research. Multi-site collaborations are vital for increasing both the impact and the generalizability of research results. However, they often present unique data sharing challenges. We discuss enabling multi-site collaborations through enhanced data sharing in this set of Ten Simple Rules.\nCollaboration is an essential component of research[2] that takes many forms, including internal (across departments within a single institution) and external collaborations (across institutions). However, multi-site collaborations with more than two institutions encounter more complex challenges because of institutional-specific restrictions and guidelines.[3] Vicens and Bourne focus on collaborators working together on a shared research grant.[4] They do not discuss the specific complexities of multi-site collaborations and the vital need for enhanced data sharing in the multi-site and large-scale collaboration context, in which participants may or may not have the same funding source and\/or research grant.\nWhile challenging, multi-site collaborations are equally rewarding and result in increased research productivity.[5][6] One highly successful multi-site and translational collaboration is the Electronic Medical Records and Genomics (eMERGE) network (URL: https:\/\/emerge.mc.vanderbilt.edu\/) initiated in 2007.[7] The eMERGE network links biorepository data with clinical information from electronic health records (EHRs). They were able to find novel associations and replicate many known associations between genetic variants and clinical phenotypes that would have been more difficult without the collaboration.[8] eMERGE members also collaborated with other consortiums and networks, including the Alzheimer\u2019s Disease Genetics Consortium[9] and the NINDS Stroke Genetics Network[10], to name a few. Other successful collaborations include OHDSI: Observational Health Data Sciences and Informatics (http:\/\/www.ohdsi.org\/), which builds off of the methodology from the Observational Medical Outcomes Partnership (OMOP)[11], and CIRCLE: Clinical Informatics Research Collaborative (http:\/\/circleinformatics.org\/). In genetics, there are many consortiums, including ExAC: The Exome Aggregation Consortium (http:\/\/exac.broadinstitute.org\/), the 1000 Genomes Project Consortium (http:\/\/www.1000genomes.org\/), the Australian BioGRID (https:\/\/www.biogrid.org.au\/), The Cancer Genome Atlas (TCGA) (http:\/\/cancergenome.nih.gov\/), Genotype-Tissue Expression Portal (GTEx: http:\/\/www.gtexportal.org\/home\/), and Encyclopedia of DNA Elements at UCSC (ENCODE: https:\/\/genome.ucsc.edu\/ENCODE\/) among others.\nBased on our experiences as both users and participants in collaborations, we present 10 simple rules on how to enable multi-site collaborations within the scientific community through enhanced data sharing. The rules focus on understanding privacy constraints, utilizing proper platforms to facilitate data sharing, thinking in global terms, and encouraging researcher engagement through incentives. We present these 10 rules in the form of a pictograph of modern life (Fig. 1), and we provide a table of example sources and sites that can be referred to for each of the ten rules (Table 1). Please note that this table is not meant to be exhaustive, only to provide some sample resources of use to the research community.\n\r\n\n\n\n\n\n\n\n\n\n\n Figure 1. Modern life context for the ten simple rules: This figure provides a framework for understanding how the \u201cTen Simple Rules to Enable Multi-site Collaborations through Data Sharing\u201d can be translated into easily understood modern life concepts. Rule 1 is Open-Source Software. The openness is signified by a window to a room filled with algorithms that are represented by gears. Rule 2 involves making the source data available whenever possible. Source data can be very useful for researchers. However, data are often housed in institutions and are not publicly accessible. These files are often stored externally; therefore, we depict this as a shed or storehouse of data, which, if possible, should be provided to research collaborators. Rule 3 is to \u201cuse multiple platforms to share research products.\u201d This increases the chances that other researchers will find and be able to utilize your research product\u2014this is represented by multiple locations (i.e., shed and house). Rule 4 involves the need to secure all necessary permissions a priori. Many datasets have data use agreements that restrict usage. These restrictions can sometimes prevent researchers from performing certain types of analyses or publishing in certain journals (e.g., journals that require all data to be openly accessible); therefore, we represent this rule as a key that can lock or unlock the door of your research. Rule 5 discusses the privacy issues that surround source data. Researchers need to understand what they can and cannot do (i.e., the privacy rules) with their data. Privacy often requires allowing certain users to have access to sections of data while restricting access to other sections of data. Researchers need to understand what can and cannot be revealed about their data (i.e., when to open and close the curtains). Rule 6 is to facilitate reproducibility whenever possible. Since communication is the forte of reproducibility, we depicted it as two researchers sharing a giant scroll, because data documentation is required and is often substantial. Rule 7 is to \u201cthink global.\u201d We conceptualize this as a cloud. This cloud allows the research property (i.e., the house and shed) to be accessed across large distances. Rule 8 is to publicize your work. Think of it as \u201cshouting from the rooftops.\u201d Publicizing is critical for enabling other researchers to access your research product. Rule 9 is to \u201cstay realistic.\u201d It is important for researchers to \u201cstay grounded\u201d and resist the urge to overstate the claims made by their research. Rule 10 is to be engaged, and this is depicted as a person waving an \u201cI heart research\u201d sign. It is vitally important to stay engaged and enthusiastic about one\u2019s research. This enables you to draw others to care about your research.\n\n\n\n\r\n\n\n\n\n\n\n\n\n\n\n Table 1. Example sources and sites for each of the ten simple rules\n\n\n\nDefinitions \nIn this paper, we use the term \"research product\" to include all results from research. This includes algorithms, developed software tools, databases, raw source data, cleaned data, and various metadata generated as a result of the research activity. We differentiate this from \"data,\" which comprises the primary \"facts and statistics collected together for analysis\" for that particular collaboration. Therefore, data could include genetic data or clinical data. By these definitions, developed software tools are not \"data\" but \"research products.\" Novel genetic sequences collected for analysis would be considered \"raw source data,\" which is a type of \"research product.\"\n\nRule 1: Make software open-source \nThe cornerstone of facilitating multi-site collaborations is to enhance data sharing and make software open-source.[12] By allowing the source code to be open, researchers allow others to both reproduce their work and build upon it in novel ways. To engage in multi-site collaborations, it is necessary for collaborators to have access to code in a repository that is shared among collaborators (although, this could be private and not open to the general public). When the study is complete and the paper is under review and\/or published, a stable copy of the code should be made available to the general public. Internal sharing allows the code to be developed, while public sharing of a stable version allows the code to be refined and built upon by others.\nMany researchers still limit access to their work despite the known advantages of making software open-source upon publication (e.g., higher impact publications[5]). For example, they allow users to interact with their algorithm by inputting data and receiving results on a web platform, while the backend algorithm often remains inaccessible. Masum et al. advocate the reuse of existing code in their Ten Simple Rules for cultivating open science.[13] However, this is often easier said than done. As long as the back-end algorithms remain hidden, open science will not be possible. Therefore, it is essential for researchers interested in participating in multi-site collaborations to make their software code and algorithms open. Because making software truly \"open\" can be complex, Prli\u0107 and Proctor provide Ten Simple Rules to assist researchers in making their software open-source.[12] Truly open-source software is an essential component in collaborations.[13] Openness also has advantages for the researchers themselves. With more eyes on the source code, others within the community can refine the code, leading to greater identification and correction of errors. There are several methods for sharing software code. If you use the R platform, then libraries can be shared with the entire open-source community via CRAN (https:\/\/cran.r-project.org\/) and Bioconductor, which is specifically for biologically related algorithms (https:\/\/www.bioconductor.org\/). Code can also be shared on Github with issue trackers for error detection.\n\nRule 2: Provide open-source data \nDeposit source data in appropriate repositories \nWhenever possible, it is important to make source data available. Openness benefits your collaborators by allowing them to perform additional analyses easily. Source data could include not only processed or cleaned data used in algorithms but also raw data files. These files can often be very large; therefore, they are often stored in some external site or data warehouse. The National Center for Biotechnology Information (NCBI) maintains the Sequence Read Archive (SRA) (https:\/\/www.ncbi.nlm.nih.gov\/sra) and the Gene Expression Omnibus (GEO) (https:\/\/www.ncbi.nlm.nih.gov\/geo\/); both are great places to deposit source data, if appropriate.\nIn addition to raw data files, it is also helpful to provide intermediate data files at various stages of processing. If comparing your results to those in the literature, it can also be useful to provide a meta-analysis with publications (along with PubMed IDs) that detail those publications that support and refute the results you obtained.\nData sharing is vitally important for multi-site collaborations by allowing researchers to compare results from across vastly different study populations, which increases the generalizability of the findings.[14] While a multi-site research project is still ongoing, data can be shared in a private shared space until all necessary data quality checks have been conducted and the findings have been published. After publication, data can be deposited in GEO, SRA, ClinVar (https:\/\/www.ncbi.nlm.nih.gov\/clinvar\/), and any other domain-specific sites that are appropriate for source data deposition.\n\nConsider middle-ground data sharing approaches for sensitive data \nRaw source data is not always fully shareable with the public. This can be because of data use restrictions (see rule 4) or privacy concerns (see rule 5). Alternative mechanisms exist for sharing portions of data with the research community. For example, the database for Genotypes and Phenotypes or dbGaP (https:\/\/www.ncbi.nlm.nih.gov\/gap) provides data holders with two levels of access: open and controlled. The open selection allows for broad release of nonsensitive data online, whereas the controlled release allows sensitive datasets to be shared with other investigators, provided certain restrictions are met. This increases the ability for researchers to share portions of their data that would not be shareable otherwise.\nIn addition to the restricted data sharing option provided by dbGaP, others have looked at ways of developing middle-ground approaches for sharing sensitive raw data or metadata. Several of these mid-level approaches use federated access systems that allow researchers to query databases containing sensitive data while preventing direct access to the data itself. An example within the United States is the Shared Health Research Information Network (SHRINE), which provides a federated system that is Health Insurance Portability and Accountability Act (HIPAA) compliant.[15] International groups have also seen success in this area. BioGrid Australia (https:\/\/www.biogrid.org.au\/) allows researchers to access hundreds of thousands of health records through a linked data platform where individual data holders maintain control of their data.[16] Researchers can then be provided with authorized access to certain elements within the data while restricting access to private sections of the medical data. These mid-level approaches facilitate collaboration both within the institution (i.e., across departments) and across institutions by allowing researchers to access sensitive data indirectly. They can even match patients to similar patients (for association analyses) while maintaining stringent privacy constraints.[17] Others provide summary statistics computed over large cohorts (e.g., ExAC browser\/database), which maintains privacy while providing others with important information about the populations that can be used in subsequent analyses and comparisons.\n\nRule 3: Use multiple platforms to share research products \nTo collaborate with researchers from different backgrounds, it is often necessary to use multiple platforms when sharing data (as different disciplines often have different policies). Using multiple platforms allows individuals from diverse backgrounds to have access to your research product. General phrases like \"open data\" and \"open science\" are used commonly in the research community but provide little direction.[13] Research products take many different forms, including 1) raw source data regardless of collection type (e.g., health data, genomic data, survey data, and epidemiological data), 2) software code (mentioned in rule 1), and 3) metadata elements and results of computations used to generate figures published in scientific research. Some data types cannot be fully shared (e.g., EHR data; see rule 5), but most algorithms and summary results\/statistics are shareable.\nEach of these types of open data necessitates a different platform for data sharing. Figshare (https:\/\/figshare.com\/) allows users to share data involving published figures. Github (https:\/\/github.com\/) allows users to share code that is in development or published. For code that is well developed, open-source packages can be created, for example, an R library, which can be deposited in CRAN or Bioconductor. R libraries can be shared immediately on Github without any code checking \u2014 this is advisable for code that is still in development. However, when code is finalized, it can be submitted to Bioconductor as an R library. Approved libraries are vetted to ensure the code works well. Vignettes are also good to write to help new users get used to the R package. When collaborating across multiple sites, it is also important to have vignettes and sample source data to help users learn how to use the code even if R is not their language of choice. Data formats, differences among formats, and programming languages are important to consider when sharing data across multiple platforms. Different platforms often have different required formats. While it may seem tedious to translate code, source data, and documentation across multiple formats and data schemas, it can be very helpful, and it will increase the number of users that will find your data and results interesting.\nTo facilitate communication among members of a collaborative effort, there are many options, including Google forums and wiki webpages, among others. Others have specially designed websites for the sole purpose of allowing users to browse and download the data directly; one such website is the ExAC Browser (http:\/\/exac.broadinstitute.org\/), which integrates data obtained from 17 different consortiums (http:\/\/exac.broadinstitute.org\/about).[18]\n\nRule 4: Secure necessary permissions\/data use agreements a priori \nSome datasets have provisos that affect publication, and these need to be addressed a priori. For example, the ability for researchers to publish an algorithm that uses a government dataset can depend on the department that generated the data. For example, certain National Aeronautics and Space Administration (NASA) datasets stipulate that data usage requires users to add certain NASA employees to subsequent publications. This is an important stipulation. Others may disallow the deposition of data into an \"open\" platform as part of their data use agreements (http:\/\/above.nasa.gov\/Documents\/NGA_Data_Access_Agreement_new.pdf). These stipulations can hinder researchers attempting to produce transparent science.\nOther datasets have data use agreements as an added layer to ensure that patients are protected. For example, the Surveillance, Epidemiology, and End Results (SEER) dataset linked with Medicare (i.e., SEER-Medicare dataset) requires that users submit the intended publication to their offices for pre-submission approval. This can seem burdensome to researchers; however, it is a condition of the data use agreement and, therefore, must be complied with. Researchers need to be aware of all provisos when including such data in their studies. Before publishing \u2014 or providing data in any type of platform whether open, restricted, or closed \u2014 it is important to secure all necessary provisions and data use agreements.\n\nRule 5: Know the privacy rules for your data \nData come with many caveats. For this reason, it is important to understand what you can and cannot do (i.e., the privacy rules) with your data. Keeping and maintaining data privacy is different from data use agreements (DUA, see rule 4). For example, data that is not sensitive may have restrictive DUAs for other reasons (e.g., data from a collaborator in industry). Also, privacy rules often involve your own source data, whereas DUAs become necessary when using data from collaborators or a government source.\nCertain datasets, e.g., genomic and EHR data, may be impossible to fully publish on an open platform due to HIPAA privacy rules and other privacy concerns related to patient re-identifiability (http:\/\/www.hhs.gov\/hipaa\/for-professionals\/privacy\/). Therefore, it is important to know the privacy stipulations of all data used in your collaborations and how this affects the ability to share results among members of the team (especially when members of the team are at different institutions). Methods that anonymize patient information while allowing patient-level data sharing may be the way of the future.[19] However, institutional-specific policies and\/or country-specific laws can limit or prevent usage of such methods. This is an important item to consider and discuss with all collaborators at the outset of any collaboration. We discuss some methods that can be used to provide some forms of sensitive data in a shareable federated space in rule 2.\n\nRule 6: Facilitate reproducibility \nAnother aspect of both data sharing and enabling multi-site collaborations is reproducibility. Sandve et al. provide Ten Simple Rules for facilitating research reproducibility in general.[20] Keeping track of research results and how data were generated is vital for reproducibility.[20] This site-level record keeping becomes vital when engaging in multi-site collaborations. If one aspect of a methodology is not conducted in the same way at one site, the overall results can be affected in drastic ways. In other words, reproducibility is a core requirement for successful collaborations.\nIn genetics and computational biology, the issue of standardizing results from across different types of gene sequencing platforms is a major issue.[21] Researchers that use a mixture of clinical and genetic data (for Phenome-Wide Association Studies, PheWAS[22]) often depend on local EHR terminology systems for identifying patient populations. Therefore, standard phenotype definitions are required and must be harmonized across multiple sites to ensure that the definitions are accurate at each site.[23] Several multi-site collaborations have developed platforms that provide links to all necessary documentation, code, and data schemas to help facilitate this process[24], including the eMERGE network. This step is integral to data sharing and enabling multi-site collaborations.\n\nRule 7: Think global \nThe importance of thinking globally cannot be overstated. Health care, genetics, climate, and all aspects of science affect the world as a whole. Therefore, it is important to think globally when performing scientific research. Most software languages are designed to be agnostic to the local language of the country. However, understanding and using these languages requires adequate documentation and user manuals to be provided in the local languages of the programmers\/implementers. Despite this, open-source languages often provide user manuals in certain languages. For example, R is a popular open-source language yet has official documented translations in only four languages: English, Russian, German, and Chinese (https:\/\/www.r-project.org\/other-docs.html). Problems can surface when collaborators in different regions run into difficulties with running R. This affects data sharing on a global scale and should be considered when collaborating on an international venue.\nTranslational mechanisms may also be necessary to understand and to harmonize country-specific terminology. This is especially important as definitions for obesity and many psychiatric conditions vary widely across the globe.[25] Even seemingly simple biological features (e.g., tall versus short) can be difficult to translate in global terms. For example, an average height Norwegian may appear to be tall in a different country. Translating biological features to common absolute metrics (e.g., height) helps to alleviate ambiguities that can occur from categorical variables. Certain diseases, especially psychiatric conditions, are extremely important to study at the multi-site level to increase the generalizability of the results.[14] However, psychiatric conditions are more difficult to translate without a thorough knowledge of how the condition is defined in the underlying country or region.[25] Solutions often involve using concrete measures, e.g., brain imaging analysis, versus subjective measures such as depression presence or absence.[14]\nThere are many layers to thinking on a global scale. There are mechanical differences (i.e., the software language and documentation) and also the conceptual differences (i.e., country- or region-specific medical definitions). Organizations such as the World Health Organization work tirelessly to integrate different conceptual interpretations of diseases into a standard guideline. Using these guidelines and not a country-specific guideline helps your research reach the broader scientific community.\nSeveral groups have successfully integrated data across multiple countries and provided their data in an open form. The Max Planck Institute for Demographic Research (MPIDR) in Germany collaborated with two separate groups to produce two databases containing international data. Both datasets contain integrated results from over 30 countries. Additionally, all finished data (after cleaning) is made available to users in an open format via two specially designed databases: the Human Fertility Database (http:\/\/www.humanfertility.org\/cgi-bin\/main.php)[26] and the Human Mortality Database (http:\/\/www.mortality.org\/).[27] Only cleaned data are returned to users in a standardized format, allowing users to easily compare countries with one another. The MPIDR collaborated with the Vienna Institute of Demography (Austria) in creating the Human Fertility Database and the University of California, Berkeley for the Human Mortality Database. They provide a good example of a group that successfully harmonized definitions across countries by overcoming international barriers, and they provided data back to researchers in an easily usable and standardized format. The group provides detailed descriptions of how they harmonized various timescales across countries in a methods document (http:\/\/www.humanfertility.org\/Docs\/methods.pdf) that could easily be submitted as a research report (see Rule 6).\n\nRule 8: Publicize your work \nPublishing all aspects of your work in the appropriate venues is vital for maintaining a multi-site collaboration. This enables each aspect of your research to be assessed by appropriate peer reviewers. Publishing different aspects of your work in separate papers in separate journals allows your contributions to be seen by those most able to learn from your work. Remember, it is important to make your research work available to those who can benefit from your results. Depending on your findings, this can include methodologists, clinicians, epidemiologists, geneticists, and others.\nNew journals have been developed recently to facilitate open science, which are focused on certain aspects of research. For instance, there are several journals that do not require novelty as a requirement such as PLOS ONE, Scientific Reports, and Cell Reports. These journals are good choices for research results that may be part of a larger research project or collaborative but are not inherently novel. Other journals, such as Scientific Data and Database, are good choices for publishing a resource containing your collected research source data. It is often advisable to publish in data-focused journals simultaneously with an algorithm or results-focused paper that highlights the novel aspects of your research. In some cases, data can be published afterwards if it is part of a large collaborative and the database or user-interface is in production at the time that the main contribution is published.\nPublishing in multiple venues is highly important for those engaged in multi-site collaborations, because these projects often involve a tremendous investment of time and resources from across many different organizations. Therefore, it is vital to highlight each and every research contribution that the collaboration has generated to facilitate further engagement from the community. If you are able to provide all raw source data on an open platform, there are new journals designed specifically to facilitate open science such as F1000 (https:\/\/f1000research.com\/) that may be worth considering. F1000 is also a great source for intermediate results such as posters, which collaborators may have presented at various conferences while working towards the final finished paper. After publication, some collaborative groups effectively utilize blogging (both macro and micro) to communicate with other researchers and the general public. However, it is also important not to overstate the claims in any paper submission\/publication or media regarding that publication but to stay focused on the individual contribution of that particular work.\n\nRule 9: Stay realistic, but aim high \nWhen performing quality research, and collaborating with others, it is important not to overstate the claims of your research \u2014 either in publication or online. It is vitally important to resist the urge to overstate the claims and to remain both humble and grounded. This is critical in collaborations because if a researcher overstates the claim in a paper, or worse, shares data publicly that he or she is unable to do legally (e.g., via the stipulations in a DUA), then the paper may be retracted. This could result in irreparable damage to the collaborative group.\nThis rule also links back to rule 2: making the source data available. This allows others in the research community to check your work interactively, which can help prevent overstating research claims.[28] A site exists that posts retracted journal articles on a public forum, retractionwatch.com. The site includes not only instances of plagiarism and fabrication of data but also papers that are retracted due to human error on the part of an experiment (e.g., a protocol was not followed exactly as specified in the paper) or on the part of the analysis (e.g., the wrong type of statistical test was performed, making the conclusions not substantiated by the data).\nSo, stay realistic, but do not be afraid to challenge the status quo. Some of the most respected research today was research that challenged the current understanding of the leading scientists at that point in time; this includes the seminal works on Pangaea and even that DNA is composed of a double helix. These concepts were earth-shattering at the time and could have been completely wrong, but the researchers backing them were not afraid to make their theories, data, and results public. These are the things that change science. So, remain humble, do not intentionally overstate the claims of your research, but at the same time do not be afraid to challenge the current mindset and way of thinking. You may be completely off, or you may just be a groundbreaking innovator.\n\nRule 10: Be engaged \nBe engaged with those using your research, your data, and your code. Communicate with them using various software social platforms such as Github, figshare, and so forth. Respond readily when users have questions and concerns. Attempt to follow the motto \"release early, release often.\" Engage with researchers in non-traditional ways. For example, several collaborative efforts have created their own gear, e.g., t-shirts, to engage the community. One such collaborative is the open-source statistical modeling language STAN (http:\/\/mc-stan.org\/). They have created their own line of STAN \"swag\" (http:\/\/mc-stan.org\/shop\/) to facilitate user engagement. Communicate often with the research community to convince them your research is worth caring about. The bottom line in collaboration is to care deeply about your research. If you care and you make it known that you care deeply about the problem, then it becomes possible to convince others that your research is important.\n\nConcluding remarks \nCollaborations, especially large, multi-site collaborations, contain many pitfalls that must be overcome. In this paper, we present 10 simple rules that will help researchers share their data and methods to facilitate successful and meaningful multi-site collaborations. We describe these rules and highlight several successful multi-site collaborations.\n\nFunding \nMRB was supported by NLM T15 LM00707 from Jul 2014\u2013Jun 2016 and by the NCATS, NIH, through TL1 TR000082, formerly the NCRR, TL1 RR024158 from Jul 2016\u2013Jun 2017. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.\n\nCompeting interests \nThe authors have declared that no competing interests exist.\n\nReferences \n\n\n\u2191 Reichman, O.J.; Jones, M.B.; Schildhauer, M.P. (2011). \"Challenges and opportunities of open data in ecology\". Science 331 (6018): 703\u20135. doi:10.1126\/science.1197962. PMID 21311007.   \n\n\u2191 Bozeman, B.; Fay, D.; Slade, C.P. (2013). \"Research collaboration in universities and academic entrepreneurship: the-state-of-the-art\". The Journal of Technology Transfer 38 (1): 1\u201367. doi:10.1007\/s10961-012-9281-8.   \n\n\u2191 Brown, P.; Morello-Frosch, R.; Brody, J.G. (2008). \"IRB Challenges in Multi-Partner Community-Based Participatory Research\". Proceedings of The American Sociological Association Annual Meeting 2008: 1-31. https:\/\/www.brown.edu\/research\/research-ethics\/irb-challenges-multi-partner-community-based-participatory-research .   \n\n\u2191 Vicens, Q.; Bourne, P.E. (2007). \"Ten simple rules for a successful collaboration\". PLOS Computational Biology 3 (3): e44. doi:10.1371\/journal.pcbi.0030044. PMC PMC1847992. PMID 17397252. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1847992 .   \n\n\u2191 5.0 5.1 Jones, B.F.; Wuchty, S.; Uzzi, B. (2008). \"Multi-university research teams: shifting impact, geography, and stratification in science\". Science 322 (5905): 1259-62. doi:10.1126\/science.1158357. PMID 18845711.   \n\n\u2191 B\u00f6rner, K.; Contractor, N.; Falk-Krzesinski, H.J. et al. (2010). \"A multi-level systems perspective for the science of team science\". Science Translational Medicine 2 (49): 49cm24. doi:10.1126\/scitranslmed.3001399. PMC PMC3527819. PMID 20844283. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3527819 .   \n\n\u2191 Gottesman, O.; Kuivaniemi, H.; Tromp, G. et al. (2013). \"The Electronic Medical Records and Genomics (eMERGE) Network: Past, present, and future\". Genetics in Medicine 15 (10): 761-71. doi:10.1038\/gim.2013.72. PMC PMC3795928. PMID 23743551. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3795928 .   \n\n\u2191 Feng, Q.; Wei, W.Q.; Chung, C.P. et al. (2016). \"The effect of genetic variation in PCSK9 on the LDL-cholesterol response to statin therapy\". The Pharmacogenomics Journal. doi:10.1038\/tpj.2016.3. PMC PMC4995153. PMID 26902539. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4995153 .   \n\n\u2191 Karch, C.M.; Ezerskiy, L.A.; Bertelsen, S. et al. (2016). \"Alzheimer's Disease Risk Polymorphisms Regulate Gene Expression in the ZCWPW1 and the CELF1 Loci\". PLOS One 11 (2): e0148717. doi:10.1371\/journal.pone.0148717. PMC PMC4769299. PMID 26919393. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4769299 .   \n\n\u2191 Malik, R.; Traylor, M.; Pulit, S.L. et al. (2016). \"Low-frequency and common genetic variation in ischemic stroke: The METASTROKE collaboration\". Neurology 86 (13): 1217-26. doi:10.1212\/WNL.0000000000002528. PMC PMC4818561. PMID 26935894. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4818561 .   \n\n\u2191 Stang, P.E.; Ryan, P.B.; Racoosin, J.A. et al. (2010). \"Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership\". Annals of Internal Medicine 153 (9): 600\u20136. doi:10.7326\/0003-4819-153-9-201011020-00010. PMID 21041580.   \n\n\u2191 12.0 12.1 Prli\u0107, A.; Procter, J.B. (2012). \"Ten simple rules for the open development of scientific software\". PLOS Computational Biology 8 (12): e1002802. doi:10.1371\/journal.pcbi.1002802. PMC PMC3516539. PMID 23236269. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3516539 .   \n\n\u2191 13.0 13.1 13.2 Masum, H.; Rao, A.; Good, B.M. et al. (2013). \"Ten simple rules for cultivating open science and collaborative R&D\". PLOS Computational Biology 9 (9): e1003244. doi:10.1371\/journal.pcbi.1003244. PMC PMC3784487. PMID 24086123. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3784487 .   \n\n\u2191 14.0 14.1 14.2 Pearlson, G. (2009). \"Multisite collaborations and large databases in psychiatric neuroimaging: Advantages, problems, and challenges\". Schizophrenia Bulletin 35 (1): 1\u20132. doi:10.1093\/schbul\/sbn166. PMC PMC2643967. PMID 19023121. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2643967 .   \n\n\u2191 Weber, G.M.; Murphy, S.N.; McMurry, A.J. et al. (2009). \"The Shared Health Research Information Network (SHRINE): A prototype federated query tool for clinical data repositories\". JAMIA 16 (5): 624-30. doi:10.1197\/jamia.M3191. PMC PMC2744712. PMID 19567788. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2744712 .   \n\n\u2191 Merriel, R.B.; Gibbs, P.; O'Brien, T.J. et al. (2011). \"BioGrid Australia facilitates collaborative medical and bioinformatics research across hospitals and medical research institutes by linking data from diverse disease and data types\". Human Mutation 32 (5): 517-25. doi:10.1002\/humu.21437. PMID 21309032.   \n\n\u2191 Boyle, D.I.; Rafael, N. (2011). \"BioGrid Australia and GRHANITE: Privacy-protecting subject matching\". Studies in Health Technology and Informatics 168: 24-34. PMID 21893908.   \n\n\u2191 Lek, M.; Karczewski, K.J.; Minikel, E.V. et al. (2016). \"Analysis of protein-coding genetic variation in 60,706 humans\". Nature 536 (7616): 285\u201391. doi:10.1038\/nature19057. PMC PMC5018207. PMID 27535533. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5018207 .   \n\n\u2191 El Emam, K.; Rodgers, S.; Malin, B. (2015). \"Anonymising and sharing individual patient data\". BMJ 350: h1139. doi:10.1136\/bmj.h1139. PMC PMC4707567. PMID 25794882. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4707567 .   \n\n\u2191 20.0 20.1 Sandve, G.K.; Nekrutenko, A.; Taylor, J. et al. (2013). \"Ten simple rules for reproducible computational research\". PLOS Computational Biology 9 (10): e1003285. doi:10.1371\/journal.pcbi.1003285. PMC PMC3812051. PMID 24204232. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3812051 .   \n\n\u2191 Bammler, T.; Beyer, R.P.; Bhattacharya, S. et al. (2005). \"Standardizing global gene expression analysis between laboratories and across platforms\". Nature Methods 2 (5): 351-6. doi:10.1038\/nmeth754. PMID 15846362.   \n\n\u2191 Denny, J.C.; Ritchie, M.D.; Basford, M.A. et al. (2010). \"PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations\". Bioinformatics 26 (9): 1205-10. doi:10.1093\/bioinformatics\/btq126. PMC PMC2859132. PMID 20335276. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2859132 .   \n\n\u2191 Newton, K.M.; Peissig, P.L.; Kho, A.N. et al. (2013). \"Validation of electronic medical record-based phenotyping algorithms: Results and lessons learned from the eMERGE network\". JAMIA 20 (e1): e147-54. doi:10.1136\/amiajnl-2012-000896. PMC PMC3715338. PMID 23531748. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3715338 .   \n\n\u2191 Pathak, J.; Wang, J.; Kashyap, S. et al. (2011). \"Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: The eMERGE Network experience\". JAMIA 18 (4): 376-86. doi:10.1136\/amiajnl-2010-000061. PMC PMC3128396. PMID 21597104. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3128396 .   \n\n\u2191 25.0 25.1 Diener, E.; Oishi, S.; Lucas, R.E. (2003). \"Personality, culture, and subjective well-being: Emotional and cognitive evaluations of life\". Annual Review of Psychology 54: 403-25. doi:10.1146\/annurev.psych.54.101601.1450561. PMID 12172000.   \n\n\u2191 \"The Human Fertility Database\". Max Planck Institute for Demographic Research and Vienna Institute of Demography. http:\/\/www.humanfertility.org\/cgi-bin\/main.php . Retrieved 06 October 2016 .   \n\n\u2191 \"The Human Mortality Database\". University of California, Berkeley and Max Planck Institute for Demographic Research. http:\/\/www.mortality.org\/ . Retrieved 06 October 2016 .   \n\n\u2191 Senn, S.J. (2009). \"Overstating the evidence: Double counting in meta-analysis and related problems\". BMC Medical Research Methodology 9: 10. doi:10.1186\/1471-2288-9-10. PMC PMC2653069. PMID 19216779. http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2653069 .   \n\n\nNotes \nThis presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.\n\n\n\n\n\n\nSource: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\">https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing<\/a>\n\t\t\t\t\tCategories: LIMSwiki journal articles (added in 2017)LIMSwiki journal articles (all)LIMSwiki journal articles on informaticsLIMSwiki journal articles on research\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\tNavigation menu\n\t\t\t\t\t\n\t\t\tViews\n\n\t\t\t\n\t\t\t\t\n\t\t\t\tJournal\n\t\t\t\tDiscussion\n\t\t\t\tView source\n\t\t\t\tHistory\n\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\t\n\t\t\t\tPersonal tools\n\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\tLog in\n\t\t\t\t\t\t\t\t\t\t\t\t\tRequest account\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\tNavigation\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tMain page\n\t\t\t\t\t\t\t\t\t\t\tRecent changes\n\t\t\t\t\t\t\t\t\t\t\tRandom page\n\t\t\t\t\t\t\t\t\t\t\tHelp\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tSearch\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t \n\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\n\t\t\n\t\t\t\n\t\t\tTools\n\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tWhat links here\n\t\t\t\t\t\t\t\t\t\t\tRelated changes\n\t\t\t\t\t\t\t\t\t\t\tSpecial pages\n\t\t\t\t\t\t\t\t\t\t\tPermanent link\n\t\t\t\t\t\t\t\t\t\t\tPage information\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\tPrint\/export\n\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\tCreate a book\n\t\t\t\t\t\t\t\t\t\t\tDownload as PDF\n\t\t\t\t\t\t\t\t\t\t\tDownload as Plain text\n\t\t\t\t\t\t\t\t\t\t\tPrintable version\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t\n\t\tSponsors\n\t\t\n\t\t\t \r\n\n\t\r\n\n\t\r\n\n\t\r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n \r\n\n\t\n\t\r\n\n\t\n\t\r\n\n\t\r\n\n\t\r\n\n\t\r\n\t\t\n\t\t\n\t\t\t\n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t This page was last modified on 14 March 2017, at 17:06.\n\t\t\t\t\t\t\t\t\tThis page has been accessed 1,466 times.\n\t\t\t\t\t\t\t\t\tContent is available under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise noted.\n\t\t\t\t\t\t\t\t\tPrivacy policy\n\t\t\t\t\t\t\t\t\tAbout LIMSWiki\n\t\t\t\t\t\t\t\t\tDisclaimers\n\t\t\t\t\t\t\t\n\t\t\n\t\t\n\t\t\n\n","0efba51aeff20a2591887ad29fac5866_html":"<body class=\"mediawiki ltr sitedir-ltr ns-206 ns-subject page-Journal_Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing skin-monobook action-view\">\n<div id=\"rdp-ebb-globalWrapper\">\n\t\t<div id=\"rdp-ebb-column-content\">\n\t\t\t<div id=\"rdp-ebb-content\" class=\"mw-body\" role=\"main\">\n\t\t\t\t<a id=\"rdp-ebb-top\"><\/a>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t<h1 id=\"rdp-ebb-firstHeading\" class=\"firstHeading\" lang=\"en\">Journal:Ten simple rules to enable multi-site collaborations through data sharing<\/h1>\n\t\t\t\t\n\t\t\t\t<div id=\"rdp-ebb-bodyContent\" class=\"mw-body-content\">\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\n\t\t\t\t\t<!-- start content -->\n\t\t\t\t\t<div id=\"rdp-ebb-mw-content-text\" lang=\"en\" dir=\"ltr\" class=\"mw-content-ltr\">\n\n\n<h2><span class=\"mw-headline\" id=\"Introduction\">Introduction<\/span><\/h2>\n<p>Open access, open data, and software are critical for advancing science and enabling collaboration across multiple institutions and throughout the world. Despite near universal recognition of its importance, major barriers still exist to sharing raw data, software, and research products throughout the scientific community. Many of these barriers vary by specialty<sup id=\"rdp-ebb-cite_ref-ReichmanChallenges11_1-0\" class=\"reference\"><a href=\"#cite_note-ReichmanChallenges11-1\" rel=\"external_link\">[1]<\/a><\/sup>, increasing the difficulties for interdisciplinary and\/or translational researchers to engage in collaborative research. Multi-site collaborations are vital for increasing both the impact and the generalizability of research results. However, they often present unique data sharing challenges. We discuss enabling multi-site collaborations through enhanced data sharing in this set of <i>Ten Simple Rules<\/i>.\n<\/p><p>Collaboration is an essential component of research<sup id=\"rdp-ebb-cite_ref-BozemanResearch13_2-0\" class=\"reference\"><a href=\"#cite_note-BozemanResearch13-2\" rel=\"external_link\">[2]<\/a><\/sup> that takes many forms, including internal (across departments within a single institution) and external collaborations (across institutions). However, multi-site collaborations with more than two institutions encounter more complex challenges because of institutional-specific restrictions and guidelines.<sup id=\"rdp-ebb-cite_ref-BrownIRB08_3-0\" class=\"reference\"><a href=\"#cite_note-BrownIRB08-3\" rel=\"external_link\">[3]<\/a><\/sup> Vicens and Bourne focus on collaborators working together on a shared research grant.<sup id=\"rdp-ebb-cite_ref-VicensTenSimple07_4-0\" class=\"reference\"><a href=\"#cite_note-VicensTenSimple07-4\" rel=\"external_link\">[4]<\/a><\/sup> They do not discuss the specific complexities of multi-site collaborations and the vital need for enhanced data sharing in the multi-site and large-scale collaboration context, in which participants may or may not have the same funding source and\/or research grant.\n<\/p><p>While challenging, multi-site collaborations are equally rewarding and result in increased research productivity.<sup id=\"rdp-ebb-cite_ref-JonesMulti08_5-0\" class=\"reference\"><a href=\"#cite_note-JonesMulti08-5\" rel=\"external_link\">[5]<\/a><\/sup><sup id=\"rdp-ebb-cite_ref-B.C3.B6rnerAMulti10_6-0\" class=\"reference\"><a href=\"#cite_note-B.C3.B6rnerAMulti10-6\" rel=\"external_link\">[6]<\/a><\/sup> One highly successful multi-site and translational collaboration is the Electronic Medical Records and Genomics (eMERGE) network (URL: <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/emerge.mc.vanderbilt.edu\/\" target=\"_blank\">https:\/\/emerge.mc.vanderbilt.edu\/<\/a>) initiated in 2007.<sup id=\"rdp-ebb-cite_ref-GottesmanTheElectronic13_7-0\" class=\"reference\"><a href=\"#cite_note-GottesmanTheElectronic13-7\" rel=\"external_link\">[7]<\/a><\/sup> The eMERGE network links biorepository data with clinical information from <a href=\"https:\/\/www.limswiki.org\/index.php\/Electronic_health_record\" title=\"Electronic health record\" target=\"_blank\" class=\"wiki-link\" data-key=\"f2e31a73217185bb01389404c1fd5255\">electronic health records<\/a> (EHRs). They were able to find novel associations and replicate many known associations between genetic variants and clinical phenotypes that would have been more difficult without the collaboration.<sup id=\"rdp-ebb-cite_ref-FengTheEffect16_8-0\" class=\"reference\"><a href=\"#cite_note-FengTheEffect16-8\" rel=\"external_link\">[8]<\/a><\/sup> eMERGE members also collaborated with other consortiums and networks, including the Alzheimer\u2019s Disease Genetics Consortium<sup id=\"rdp-ebb-cite_ref-KarchAlzh16_9-0\" class=\"reference\"><a href=\"#cite_note-KarchAlzh16-9\" rel=\"external_link\">[9]<\/a><\/sup> and the NINDS Stroke Genetics Network<sup id=\"rdp-ebb-cite_ref-MalikLow16_10-0\" class=\"reference\"><a href=\"#cite_note-MalikLow16-10\" rel=\"external_link\">[10]<\/a><\/sup>, to name a few. Other successful collaborations include OHDSI: Observational Health Data Sciences and Informatics (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.ohdsi.org\/\" target=\"_blank\">http:\/\/www.ohdsi.org\/<\/a>), which builds off of the methodology from the Observational Medical Outcomes Partnership (OMOP)<sup id=\"rdp-ebb-cite_ref-StangAdvancing10_11-0\" class=\"reference\"><a href=\"#cite_note-StangAdvancing10-11\" rel=\"external_link\">[11]<\/a><\/sup>, and CIRCLE: Clinical Informatics Research Collaborative (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/circleinformatics.org\/\" target=\"_blank\">http:\/\/circleinformatics.org\/<\/a>). In genetics, there are many consortiums, including ExAC: The Exome Aggregation Consortium (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/exac.broadinstitute.org\/\" target=\"_blank\">http:\/\/exac.broadinstitute.org\/<\/a>), the 1000 Genomes Project Consortium (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.1000genomes.org\/\" target=\"_blank\">http:\/\/www.1000genomes.org\/<\/a>), the Australian BioGRID (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.biogrid.org.au\/\" target=\"_blank\">https:\/\/www.biogrid.org.au\/<\/a>), The Cancer Genome Atlas (TCGA) (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/cancergenome.nih.gov\/\" target=\"_blank\">http:\/\/cancergenome.nih.gov\/<\/a>), Genotype-Tissue Expression Portal (GTEx: <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.gtexportal.org\/home\/\" target=\"_blank\">http:\/\/www.gtexportal.org\/home\/<\/a>), and Encyclopedia of DNA Elements at UCSC (ENCODE: <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/genome.ucsc.edu\/ENCODE\/\" target=\"_blank\">https:\/\/genome.ucsc.edu\/ENCODE\/<\/a>) among others.\n<\/p><p>Based on our experiences as both users and participants in collaborations, we present 10 simple rules on how to enable multi-site collaborations within the scientific community through enhanced data sharing. The rules focus on understanding privacy constraints, utilizing proper platforms to facilitate data sharing, thinking in global terms, and encouraging researcher engagement through incentives. We present these 10 rules in the form of a pictograph of modern life (Fig. 1), and we provide a table of example sources and sites that can be referred to for each of the ten rules (Table 1). Please note that this table is not meant to be exhaustive, only to provide some sample resources of use to the research community.\n<\/p><p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Fig1_Boland_PLOSCompBio2017_13-1.png\" class=\"image wiki-link\" target=\"_blank\" data-key=\"a724833294d0b12f4a10600ad72e1401\"><img alt=\"Fig1 Boland PLOSCompBio2017 13-1.png\" src=\"https:\/\/www.limswiki.org\/images\/d\/d1\/Fig1_Boland_PLOSCompBio2017_13-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Figure 1.<\/b> Modern life context for the ten simple rules: This figure provides a framework for understanding how the \u201cTen Simple Rules to Enable Multi-site Collaborations through Data Sharing\u201d can be translated into easily understood modern life concepts. <b>Rule 1<\/b> is Open-Source Software. The openness is signified by a window to a room filled with algorithms that are represented by gears. <b>Rule 2<\/b> involves making the source data available whenever possible. Source data can be very useful for researchers. However, data are often housed in institutions and are not publicly accessible. These files are often stored externally; therefore, we depict this as a shed or storehouse of data, which, if possible, should be provided to research collaborators. <b>Rule 3<\/b> is to \u201cuse multiple platforms to share research products.\u201d This increases the chances that other researchers will find and be able to utilize your research product\u2014this is represented by multiple locations (i.e., shed and house). <b>Rule 4<\/b> involves the need to secure all necessary permissions a priori. Many datasets have data use agreements that restrict usage. These restrictions can sometimes prevent researchers from performing certain types of analyses or publishing in certain journals (e.g., journals that require all data to be openly accessible); therefore, we represent this rule as a key that can lock or unlock the door of your research. <b>Rule 5<\/b> discusses the privacy issues that surround source data. Researchers need to understand what they can and cannot do (i.e., the privacy rules) with their data. Privacy often requires allowing certain users to have access to sections of data while restricting access to other sections of data. Researchers need to understand what can and cannot be revealed about their data (i.e., when to open and close the curtains). <b>Rule 6<\/b> is to facilitate reproducibility whenever possible. Since communication is the forte of reproducibility, we depicted it as two researchers sharing a giant scroll, because data documentation is required and is often substantial. <b>Rule 7<\/b> is to \u201cthink global.\u201d We conceptualize this as a cloud. This cloud allows the research property (i.e., the house and shed) to be accessed across large distances. <b>Rule 8<\/b> is to publicize your work. Think of it as \u201cshouting from the rooftops.\u201d Publicizing is critical for enabling other researchers to access your research product. <b>Rule 9<\/b> is to \u201cstay realistic.\u201d It is important for researchers to \u201cstay grounded\u201d and resist the urge to overstate the claims made by their research. <b>Rule 10<\/b> is to be engaged, and this is depicted as a person waving an \u201cI heart research\u201d sign. It is vitally important to stay engaged and enthusiastic about one\u2019s research. This enables you to draw others to care about your research.<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<p><br \/>\n<a href=\"https:\/\/www.limswiki.org\/index.php\/File:Tab1_Boland_PLOSCompBio2017_13-1.png\" class=\"image wiki-link\" target=\"_blank\" data-key=\"c36be57f1e57f80b1ec304e3a2497926\"><img alt=\"Tab1 Boland PLOSCompBio2017 13-1.png\" src=\"https:\/\/www.limswiki.org\/images\/0\/07\/Tab1_Boland_PLOSCompBio2017_13-1.png\" style=\"width: 100%;max-width: 400px;height: auto;\" \/><\/a>\n<\/p>\n<div style=\"clear:both;\"><\/div>\n<table style=\"\">\n<tr>\n<td style=\"vertical-align:top;\">\n<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\" style=\"\">\n\n<tr>\n<td style=\"background-color:white; padding-left:10px; padding-right:10px;\"> <blockquote><b>Table 1.<\/b> Example sources and sites for each of the ten simple rules<\/blockquote>\n<\/td><\/tr>\n<\/table>\n<\/td><\/tr><\/table>\n<h2><span class=\"mw-headline\" id=\"Definitions\">Definitions<\/span><\/h2>\n<p>In this paper, we use the term \"research product\" to include all results from research. This includes algorithms, developed software tools, databases, raw source data, cleaned data, and various metadata generated as a result of the research activity. We differentiate this from \"data,\" which comprises the primary \"facts and statistics collected together for analysis\" for that particular collaboration. Therefore, data could include genetic data or clinical data. By these definitions, developed software tools are not \"data\" but \"research products.\" Novel genetic sequences collected for analysis would be considered \"raw source data,\" which is a type of \"research product.\"\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_1:_Make_software_open-source\">Rule 1: Make software open-source<\/span><\/h2>\n<p>The cornerstone of facilitating multi-site collaborations is to enhance data sharing and make software open-source.<sup id=\"rdp-ebb-cite_ref-Prli.C4.87Ten12_12-0\" class=\"reference\"><a href=\"#cite_note-Prli.C4.87Ten12-12\" rel=\"external_link\">[12]<\/a><\/sup> By allowing the source code to be open, researchers allow others to both reproduce their work and build upon it in novel ways. To engage in multi-site collaborations, it is necessary for collaborators to have access to code in a repository that is shared among collaborators (although, this could be private and not open to the general public). When the study is complete and the paper is under review and\/or published, a stable copy of the code should be made available to the general public. Internal sharing allows the code to be developed, while public sharing of a stable version allows the code to be refined and built upon by others.\n<\/p><p>Many researchers still limit access to their work despite the known advantages of making software open-source upon publication (e.g., higher impact publications<sup id=\"rdp-ebb-cite_ref-JonesMulti08_5-1\" class=\"reference\"><a href=\"#cite_note-JonesMulti08-5\" rel=\"external_link\">[5]<\/a><\/sup>). For example, they allow users to interact with their algorithm by inputting data and receiving results on a web platform, while the backend algorithm often remains inaccessible. Masum <i>et al.<\/i> advocate the reuse of existing code in their <i>Ten Simple Rules<\/i> for cultivating open science.<sup id=\"rdp-ebb-cite_ref-MasumTen13_13-0\" class=\"reference\"><a href=\"#cite_note-MasumTen13-13\" rel=\"external_link\">[13]<\/a><\/sup> However, this is often easier said than done. As long as the back-end algorithms remain hidden, open science will not be possible. Therefore, it is essential for researchers interested in participating in multi-site collaborations to make their software code and algorithms open. Because making software truly \"open\" can be complex, Prli\u0107 and Proctor provide <i>Ten Simple Rules<\/i> to assist researchers in making their software open-source.<sup id=\"rdp-ebb-cite_ref-Prli.C4.87Ten12_12-1\" class=\"reference\"><a href=\"#cite_note-Prli.C4.87Ten12-12\" rel=\"external_link\">[12]<\/a><\/sup> Truly open-source software is an essential component in collaborations.<sup id=\"rdp-ebb-cite_ref-MasumTen13_13-1\" class=\"reference\"><a href=\"#cite_note-MasumTen13-13\" rel=\"external_link\">[13]<\/a><\/sup> Openness also has advantages for the researchers themselves. With more eyes on the source code, others within the community can refine the code, leading to greater identification and correction of errors. There are several methods for sharing software code. If you use the <a href=\"https:\/\/www.limswiki.org\/index.php\/R_(programming_language)\" title=\"R (programming language)\" target=\"_blank\" class=\"wiki-link\" data-key=\"1b0aa598f071aca4c5b4ee08d8bb2bde\">R platform<\/a>, then libraries can be shared with the entire open-source community via CRAN (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/cran.r-project.org\/\" target=\"_blank\">https:\/\/cran.r-project.org\/<\/a>) and Bioconductor, which is specifically for biologically related algorithms (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.bioconductor.org\/\" target=\"_blank\">https:\/\/www.bioconductor.org\/<\/a>). Code can also be shared on Github with issue trackers for error detection.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_2:_Provide_open-source_data\">Rule 2: Provide open-source data<\/span><\/h2>\n<h3><span class=\"mw-headline\" id=\"Deposit_source_data_in_appropriate_repositories\">Deposit source data in appropriate repositories<\/span><\/h3>\n<p>Whenever possible, it is important to make source data available. Openness benefits your collaborators by allowing them to perform additional analyses easily. Source data could include not only processed or cleaned data used in algorithms but also raw data files. These files can often be very large; therefore, they are often stored in some external site or data warehouse. The National Center for Biotechnology Information (NCBI) maintains the Sequence Read Archive (SRA) (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.ncbi.nlm.nih.gov\/sra\" target=\"_blank\">https:\/\/www.ncbi.nlm.nih.gov\/sra<\/a>) and the Gene Expression Omnibus (GEO) (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.ncbi.nlm.nih.gov\/geo\/\" target=\"_blank\">https:\/\/www.ncbi.nlm.nih.gov\/geo\/<\/a>); both are great places to deposit source data, if appropriate.\n<\/p><p>In addition to raw data files, it is also helpful to provide intermediate data files at various stages of processing. If comparing your results to those in the literature, it can also be useful to provide a meta-analysis with publications (along with PubMed IDs) that detail those publications that support and refute the results you obtained.\n<\/p><p>Data sharing is vitally important for multi-site collaborations by allowing researchers to compare results from across vastly different study populations, which increases the generalizability of the findings.<sup id=\"rdp-ebb-cite_ref-PearlsonMultisite09_14-0\" class=\"reference\"><a href=\"#cite_note-PearlsonMultisite09-14\" rel=\"external_link\">[14]<\/a><\/sup> While a multi-site research project is still ongoing, data can be shared in a private shared space until all necessary data quality checks have been conducted and the findings have been published. After publication, data can be deposited in GEO, SRA, ClinVar (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.ncbi.nlm.nih.gov\/clinvar\/\" target=\"_blank\">https:\/\/www.ncbi.nlm.nih.gov\/clinvar\/<\/a>), and any other domain-specific sites that are appropriate for source data deposition.\n<\/p>\n<h3><span class=\"mw-headline\" id=\"Consider_middle-ground_data_sharing_approaches_for_sensitive_data\">Consider middle-ground data sharing approaches for sensitive data<\/span><\/h3>\n<p>Raw source data is not always fully shareable with the public. This can be because of data use restrictions (see rule 4) or privacy concerns (see rule 5). Alternative mechanisms exist for sharing portions of data with the research community. For example, the database for Genotypes and Phenotypes or dbGaP (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.ncbi.nlm.nih.gov\/gap\" target=\"_blank\">https:\/\/www.ncbi.nlm.nih.gov\/gap<\/a>) provides data holders with two levels of access: open and controlled. The open selection allows for broad release of nonsensitive data online, whereas the controlled release allows sensitive datasets to be shared with other investigators, provided certain restrictions are met. This increases the ability for researchers to share portions of their data that would not be shareable otherwise.\n<\/p><p>In addition to the restricted data sharing option provided by dbGaP, others have looked at ways of developing middle-ground approaches for sharing sensitive raw data or metadata. Several of these mid-level approaches use federated access systems that allow researchers to query databases containing sensitive data while preventing direct access to the data itself. An example within the United States is the Shared Health Research Information Network (SHRINE), which provides a federated system that is <a href=\"https:\/\/www.limswiki.org\/index.php\/Health_Insurance_Portability_and_Accountability_Act\" title=\"Health Insurance Portability and Accountability Act\" target=\"_blank\" class=\"wiki-link\" data-key=\"b70673a0117c21576016cb7498867153\">Health Insurance Portability and Accountability Act<\/a> (HIPAA) compliant.<sup id=\"rdp-ebb-cite_ref-WeberTheShared09_15-0\" class=\"reference\"><a href=\"#cite_note-WeberTheShared09-15\" rel=\"external_link\">[15]<\/a><\/sup> International groups have also seen success in this area. BioGrid Australia (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.biogrid.org.au\/\" target=\"_blank\">https:\/\/www.biogrid.org.au\/<\/a>) allows researchers to access hundreds of thousands of health records through a linked data platform where individual data holders maintain control of their data.<sup id=\"rdp-ebb-cite_ref-MerrielBioGrid11_16-0\" class=\"reference\"><a href=\"#cite_note-MerrielBioGrid11-16\" rel=\"external_link\">[16]<\/a><\/sup> Researchers can then be provided with authorized access to certain elements within the data while restricting access to private sections of the medical data. These mid-level approaches facilitate collaboration both within the institution (i.e., across departments) and across institutions by allowing researchers to access sensitive data indirectly. They can even match patients to similar patients (for association analyses) while maintaining stringent privacy constraints.<sup id=\"rdp-ebb-cite_ref-BoyleBioGrid11_17-0\" class=\"reference\"><a href=\"#cite_note-BoyleBioGrid11-17\" rel=\"external_link\">[17]<\/a><\/sup> Others provide summary statistics computed over large cohorts (e.g., ExAC browser\/database), which maintains privacy while providing others with important information about the populations that can be used in subsequent analyses and comparisons.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_3:_Use_multiple_platforms_to_share_research_products\">Rule 3: Use multiple platforms to share research products<\/span><\/h2>\n<p>To collaborate with researchers from different backgrounds, it is often necessary to use multiple platforms when sharing data (as different disciplines often have different policies). Using multiple platforms allows individuals from diverse backgrounds to have access to your research product. General phrases like \"open data\" and \"open science\" are used commonly in the research community but provide little direction.<sup id=\"rdp-ebb-cite_ref-MasumTen13_13-2\" class=\"reference\"><a href=\"#cite_note-MasumTen13-13\" rel=\"external_link\">[13]<\/a><\/sup> Research products take many different forms, including 1) raw source data regardless of collection type (e.g., health data, genomic data, survey data, and epidemiological data), 2) software code (mentioned in rule 1), and 3) metadata elements and results of computations used to generate figures published in scientific research. Some data types cannot be fully shared (e.g., EHR data; see rule 5), but most algorithms and summary results\/statistics are shareable.\n<\/p><p>Each of these types of open data necessitates a different platform for data sharing. Figshare (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/figshare.com\/\" target=\"_blank\">https:\/\/figshare.com\/<\/a>) allows users to share data involving published figures. Github (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/github.com\/\" target=\"_blank\">https:\/\/github.com\/<\/a>) allows users to share code that is in development or published. For code that is well developed, open-source packages can be created, for example, an R library, which can be deposited in CRAN or Bioconductor. R libraries can be shared immediately on Github without any code checking \u2014 this is advisable for code that is still in development. However, when code is finalized, it can be submitted to Bioconductor as an R library. Approved libraries are vetted to ensure the code works well. Vignettes are also good to write to help new users get used to the R package. When collaborating across multiple sites, it is also important to have vignettes and sample source data to help users learn how to use the code even if R is not their language of choice. Data formats, differences among formats, and programming languages are important to consider when sharing data across multiple platforms. Different platforms often have different required formats. While it may seem tedious to translate code, source data, and documentation across multiple formats and data schemas, it can be very helpful, and it will increase the number of users that will find your data and results interesting.\n<\/p><p>To facilitate communication among members of a collaborative effort, there are many options, including Google forums and wiki webpages, among others. Others have specially designed websites for the sole purpose of allowing users to browse and download the data directly; one such website is the ExAC Browser (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/exac.broadinstitute.org\/\" target=\"_blank\">http:\/\/exac.broadinstitute.org\/<\/a>), which integrates data obtained from 17 different consortiums (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/exac.broadinstitute.org\/about\" target=\"_blank\">http:\/\/exac.broadinstitute.org\/about<\/a>).<sup id=\"rdp-ebb-cite_ref-LekAnalysis16_18-0\" class=\"reference\"><a href=\"#cite_note-LekAnalysis16-18\" rel=\"external_link\">[18]<\/a><\/sup>\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_4:_Secure_necessary_permissions.2Fdata_use_agreements_a_priori\">Rule 4: Secure necessary permissions\/data use agreements <i>a priori<\/i><\/span><\/h2>\n<p>Some datasets have provisos that affect publication, and these need to be addressed <i>a priori<\/i>. For example, the ability for researchers to publish an algorithm that uses a government dataset can depend on the department that generated the data. For example, certain National Aeronautics and Space Administration (NASA) datasets stipulate that data usage requires users to add certain NASA employees to subsequent publications. This is an important stipulation. Others may disallow the deposition of data into an \"open\" platform as part of their data use agreements (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/above.nasa.gov\/Documents\/NGA_Data_Access_Agreement_new.pdf\" target=\"_blank\">http:\/\/above.nasa.gov\/Documents\/NGA_Data_Access_Agreement_new.pdf<\/a>). These stipulations can hinder researchers attempting to produce transparent science.\n<\/p><p>Other datasets have data use agreements as an added layer to ensure that patients are protected. For example, the Surveillance, Epidemiology, and End Results (SEER) dataset linked with Medicare (i.e., SEER-Medicare dataset) requires that users submit the intended publication to their offices for pre-submission approval. This can seem burdensome to researchers; however, it is a condition of the data use agreement and, therefore, must be complied with. Researchers need to be aware of all provisos when including such data in their studies. Before publishing \u2014 or providing data in any type of platform whether open, restricted, or closed \u2014 it is important to secure all necessary provisions and data use agreements.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_5:_Know_the_privacy_rules_for_your_data\">Rule 5: Know the privacy rules for your data<\/span><\/h2>\n<p>Data come with many caveats. For this reason, it is important to understand what you can and cannot do (i.e., the privacy rules) with your data. Keeping and maintaining data privacy is different from data use agreements (DUA, see rule 4). For example, data that is not sensitive may have restrictive DUAs for other reasons (e.g., data from a collaborator in industry). Also, privacy rules often involve your own source data, whereas DUAs become necessary when using data from collaborators or a government source.\n<\/p><p>Certain datasets, e.g., genomic and EHR data, may be impossible to fully publish on an open platform due to HIPAA privacy rules and other privacy concerns related to patient re-identifiability (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.hhs.gov\/hipaa\/for-professionals\/privacy\/\" target=\"_blank\">http:\/\/www.hhs.gov\/hipaa\/for-professionals\/privacy\/<\/a>). Therefore, it is important to know the privacy stipulations of all data used in your collaborations and how this affects the ability to share results among members of the team (especially when members of the team are at different institutions). Methods that anonymize patient information while allowing patient-level data sharing may be the way of the future.<sup id=\"rdp-ebb-cite_ref-ElEmamAnon15_19-0\" class=\"reference\"><a href=\"#cite_note-ElEmamAnon15-19\" rel=\"external_link\">[19]<\/a><\/sup> However, institutional-specific policies and\/or country-specific laws can limit or prevent usage of such methods. This is an important item to consider and discuss with all collaborators at the outset of any collaboration. We discuss some methods that can be used to provide some forms of sensitive data in a shareable federated space in rule 2.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_6:_Facilitate_reproducibility\">Rule 6: Facilitate reproducibility<\/span><\/h2>\n<p>Another aspect of both data sharing and enabling multi-site collaborations is reproducibility. Sandve <i>et al.<\/i> provide <i>Ten Simple Rules<\/i> for facilitating research reproducibility in general.<sup id=\"rdp-ebb-cite_ref-SandveTen13_20-0\" class=\"reference\"><a href=\"#cite_note-SandveTen13-20\" rel=\"external_link\">[20]<\/a><\/sup> Keeping track of research results and how data were generated is vital for reproducibility.<sup id=\"rdp-ebb-cite_ref-SandveTen13_20-1\" class=\"reference\"><a href=\"#cite_note-SandveTen13-20\" rel=\"external_link\">[20]<\/a><\/sup> This site-level record keeping becomes vital when engaging in multi-site collaborations. If one aspect of a methodology is not conducted in the same way at one site, the overall results can be affected in drastic ways. In other words, reproducibility is a core requirement for successful collaborations.\n<\/p><p>In genetics and computational biology, the issue of standardizing results from across different types of gene sequencing platforms is a major issue.<sup id=\"rdp-ebb-cite_ref-BammlerStandard05_21-0\" class=\"reference\"><a href=\"#cite_note-BammlerStandard05-21\" rel=\"external_link\">[21]<\/a><\/sup> Researchers that use a mixture of clinical and genetic data (for Phenome-Wide Association Studies, PheWAS<sup id=\"rdp-ebb-cite_ref-DennyPheWAS10_22-0\" class=\"reference\"><a href=\"#cite_note-DennyPheWAS10-22\" rel=\"external_link\">[22]<\/a><\/sup>) often depend on local EHR terminology systems for identifying patient populations. Therefore, standard phenotype definitions are required and must be harmonized across multiple sites to ensure that the definitions are accurate at each site.<sup id=\"rdp-ebb-cite_ref-NewtonValid13_23-0\" class=\"reference\"><a href=\"#cite_note-NewtonValid13-23\" rel=\"external_link\">[23]<\/a><\/sup> Several multi-site collaborations have developed platforms that provide links to all necessary documentation, code, and data schemas to help facilitate this process<sup id=\"rdp-ebb-cite_ref-PathakMapping11_24-0\" class=\"reference\"><a href=\"#cite_note-PathakMapping11-24\" rel=\"external_link\">[24]<\/a><\/sup>, including the eMERGE network. This step is integral to data sharing and enabling multi-site collaborations.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_7:_Think_global\">Rule 7: Think global<\/span><\/h2>\n<p>The importance of thinking globally cannot be overstated. Health care, genetics, climate, and all aspects of science affect the world as a whole. Therefore, it is important to think globally when performing scientific research. Most software languages are designed to be agnostic to the local language of the country. However, understanding and using these languages requires adequate documentation and user manuals to be provided in the local languages of the programmers\/implementers. Despite this, open-source languages often provide user manuals in certain languages. For example, R is a popular open-source language yet has official documented translations in only four languages: English, Russian, German, and Chinese (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.r-project.org\/other-docs.html\" target=\"_blank\">https:\/\/www.r-project.org\/other-docs.html<\/a>). Problems can surface when collaborators in different regions run into difficulties with running R. This affects data sharing on a global scale and should be considered when collaborating on an international venue.\n<\/p><p>Translational mechanisms may also be necessary to understand and to harmonize country-specific terminology. This is especially important as definitions for obesity and many psychiatric conditions vary widely across the globe.<sup id=\"rdp-ebb-cite_ref-DienerPersonality03_25-0\" class=\"reference\"><a href=\"#cite_note-DienerPersonality03-25\" rel=\"external_link\">[25]<\/a><\/sup> Even seemingly simple biological features (e.g., tall versus short) can be difficult to translate in global terms. For example, an average height Norwegian may appear to be tall in a different country. Translating biological features to common absolute metrics (e.g., height) helps to alleviate ambiguities that can occur from categorical variables. Certain diseases, especially psychiatric conditions, are extremely important to study at the multi-site level to increase the generalizability of the results.<sup id=\"rdp-ebb-cite_ref-PearlsonMultisite09_14-1\" class=\"reference\"><a href=\"#cite_note-PearlsonMultisite09-14\" rel=\"external_link\">[14]<\/a><\/sup> However, psychiatric conditions are more difficult to translate without a thorough knowledge of how the condition is defined in the underlying country or region.<sup id=\"rdp-ebb-cite_ref-DienerPersonality03_25-1\" class=\"reference\"><a href=\"#cite_note-DienerPersonality03-25\" rel=\"external_link\">[25]<\/a><\/sup> Solutions often involve using concrete measures, e.g., brain imaging analysis, versus subjective measures such as depression presence or absence.<sup id=\"rdp-ebb-cite_ref-PearlsonMultisite09_14-2\" class=\"reference\"><a href=\"#cite_note-PearlsonMultisite09-14\" rel=\"external_link\">[14]<\/a><\/sup>\n<\/p><p>There are many layers to thinking on a global scale. There are mechanical differences (i.e., the software language and documentation) and also the conceptual differences (i.e., country- or region-specific medical definitions). Organizations such as the World Health Organization work tirelessly to integrate different conceptual interpretations of diseases into a standard guideline. Using these guidelines and not a country-specific guideline helps your research reach the broader scientific community.\n<\/p><p>Several groups have successfully integrated data across multiple countries and provided their data in an open form. The Max Planck Institute for Demographic Research (MPIDR) in Germany collaborated with two separate groups to produce two databases containing international data. Both datasets contain integrated results from over 30 countries. Additionally, all finished data (after cleaning) is made available to users in an open format via two specially designed databases: the Human Fertility Database (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.humanfertility.org\/cgi-bin\/main.php\" target=\"_blank\">http:\/\/www.humanfertility.org\/cgi-bin\/main.php<\/a>)<sup id=\"rdp-ebb-cite_ref-HFD_26-0\" class=\"reference\"><a href=\"#cite_note-HFD-26\" rel=\"external_link\">[26]<\/a><\/sup> and the Human Mortality Database (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.mortality.org\/\" target=\"_blank\">http:\/\/www.mortality.org\/<\/a>).<sup id=\"rdp-ebb-cite_ref-HMD_27-0\" class=\"reference\"><a href=\"#cite_note-HMD-27\" rel=\"external_link\">[27]<\/a><\/sup> Only cleaned data are returned to users in a standardized format, allowing users to easily compare countries with one another. The MPIDR collaborated with the Vienna Institute of Demography (Austria) in creating the Human Fertility Database and the University of California, Berkeley for the Human Mortality Database. They provide a good example of a group that successfully harmonized definitions across countries by overcoming international barriers, and they provided data back to researchers in an easily usable and standardized format. The group provides detailed descriptions of how they harmonized various timescales across countries in a methods document (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.humanfertility.org\/Docs\/methods.pdf\" target=\"_blank\">http:\/\/www.humanfertility.org\/Docs\/methods.pdf<\/a>) that could easily be submitted as a research report (see Rule 6).\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_8:_Publicize_your_work\">Rule 8: Publicize your work<\/span><\/h2>\n<p>Publishing all aspects of your work in the appropriate venues is vital for maintaining a multi-site collaboration. This enables each aspect of your research to be assessed by appropriate peer reviewers. Publishing different aspects of your work in separate papers in separate journals allows your contributions to be seen by those most able to learn from your work. Remember, it is important to make your research work available to those who can benefit from your results. Depending on your findings, this can include methodologists, clinicians, epidemiologists, geneticists, and others.\n<\/p><p>New journals have been developed recently to facilitate open science, which are focused on certain aspects of research. For instance, there are several journals that do not require novelty as a requirement such as <i>PLOS ONE<\/i>, <i>Scientific Reports<\/i>, and <i>Cell Reports<\/i>. These journals are good choices for research results that may be part of a larger research project or collaborative but are not inherently novel. Other journals, such as <i>Scientific Data and Database<\/i>, are good choices for publishing a resource containing your collected research source data. It is often advisable to publish in data-focused journals simultaneously with an algorithm or results-focused paper that highlights the novel aspects of your research. In some cases, data can be published afterwards if it is part of a large collaborative and the database or user-interface is in production at the time that the main contribution is published.\n<\/p><p>Publishing in multiple venues is highly important for those engaged in multi-site collaborations, because these projects often involve a tremendous investment of time and resources from across many different organizations. Therefore, it is vital to highlight each and every research contribution that the collaboration has generated to facilitate further engagement from the community. If you are able to provide all raw source data on an open platform, there are new journals designed specifically to facilitate open science such as <i>F1000<\/i> (<a rel=\"external_link\" class=\"external free\" href=\"https:\/\/f1000research.com\/\" target=\"_blank\">https:\/\/f1000research.com\/<\/a>) that may be worth considering. <i>F1000<\/i> is also a great source for intermediate results such as posters, which collaborators may have presented at various conferences while working towards the final finished paper. After publication, some collaborative groups effectively utilize blogging (both macro and micro) to communicate with other researchers and the general public. However, it is also important not to overstate the claims in any paper submission\/publication or media regarding that publication but to stay focused on the individual contribution of that particular work.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_9:_Stay_realistic.2C_but_aim_high\">Rule 9: Stay realistic, but aim high<\/span><\/h2>\n<p>When performing quality research, and collaborating with others, it is important not to overstate the claims of your research \u2014 either in publication or online. It is vitally important to resist the urge to overstate the claims and to remain both humble and grounded. This is critical in collaborations because if a researcher overstates the claim in a paper, or worse, shares data publicly that he or she is unable to do legally (e.g., via the stipulations in a DUA), then the paper may be retracted. This could result in irreparable damage to the collaborative group.\n<\/p><p>This rule also links back to rule 2: making the source data available. This allows others in the research community to check your work interactively, which can help prevent overstating research claims.<sup id=\"rdp-ebb-cite_ref-SennOverstating09_28-0\" class=\"reference\"><a href=\"#cite_note-SennOverstating09-28\" rel=\"external_link\">[28]<\/a><\/sup> A site exists that posts retracted journal articles on a public forum, <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/retractionwatch.com\/\" target=\"_blank\">retractionwatch.com<\/a>. The site includes not only instances of plagiarism and fabrication of data but also papers that are retracted due to human error on the part of an experiment (e.g., a protocol was not followed exactly as specified in the paper) or on the part of the analysis (e.g., the wrong type of statistical test was performed, making the conclusions not substantiated by the data).\n<\/p><p>So, stay realistic, but do not be afraid to challenge the status quo. Some of the most respected research today was research that challenged the current understanding of the leading scientists at that point in time; this includes the seminal works on Pangaea and even that DNA is composed of a double helix. These concepts were earth-shattering at the time and could have been completely wrong, but the researchers backing them were not afraid to make their theories, data, and results public. These are the things that change science. So, remain humble, do not intentionally overstate the claims of your research, but at the same time do not be afraid to challenge the current mindset and way of thinking. You may be completely off, or you may just be a groundbreaking innovator.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Rule_10:_Be_engaged\">Rule 10: Be engaged<\/span><\/h2>\n<p>Be engaged with those using your research, your data, and your code. Communicate with them using various software social platforms such as Github, figshare, and so forth. Respond readily when users have questions and concerns. Attempt to follow the motto \"release early, release often.\" Engage with researchers in non-traditional ways. For example, several collaborative efforts have created their own gear, e.g., t-shirts, to engage the community. One such collaborative is the open-source statistical modeling language STAN (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/mc-stan.org\/\" target=\"_blank\">http:\/\/mc-stan.org\/<\/a>). They have created their own line of STAN \"swag\" (<a rel=\"external_link\" class=\"external free\" href=\"http:\/\/mc-stan.org\/shop\/\" target=\"_blank\">http:\/\/mc-stan.org\/shop\/<\/a>) to facilitate user engagement. Communicate often with the research community to convince them your research is worth caring about. The bottom line in collaboration is to care deeply about your research. If you care and you make it known that you care deeply about the problem, then it becomes possible to convince others that your research is important.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Concluding_remarks\">Concluding remarks<\/span><\/h2>\n<p>Collaborations, especially large, multi-site collaborations, contain many pitfalls that must be overcome. In this paper, we present 10 simple rules that will help researchers share their data and methods to facilitate successful and meaningful multi-site collaborations. We describe these rules and highlight several successful multi-site collaborations.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Funding\">Funding<\/span><\/h2>\n<p>MRB was supported by NLM T15 LM00707 from Jul 2014\u2013Jun 2016 and by the NCATS, NIH, through TL1 TR000082, formerly the NCRR, TL1 RR024158 from Jul 2016\u2013Jun 2017. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"Competing_interests\">Competing interests<\/span><\/h2>\n<p>The authors have declared that no competing interests exist.\n<\/p>\n<h2><span class=\"mw-headline\" id=\"References\">References<\/span><\/h2>\n<div class=\"reflist references-column-width\" style=\"-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em; list-style-type: decimal;\">\n<ol class=\"references\">\n<li id=\"cite_note-ReichmanChallenges11-1\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ReichmanChallenges11_1-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Reichman, O.J.; Jones, M.B.; Schildhauer, M.P. (2011). \"Challenges and opportunities of open data in ecology\". <i>Science<\/i> <b>331<\/b> (6018): 703\u20135. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1126%2Fscience.1197962\" target=\"_blank\">10.1126\/science.1197962<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21311007\" target=\"_blank\">21311007<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Challenges+and+opportunities+of+open+data+in+ecology&rft.jtitle=Science&rft.aulast=Reichman%2C+O.J.%3B+Jones%2C+M.B.%3B+Schildhauer%2C+M.P.&rft.au=Reichman%2C+O.J.%3B+Jones%2C+M.B.%3B+Schildhauer%2C+M.P.&rft.date=2011&rft.volume=331&rft.issue=6018&rft.pages=703%E2%80%935&rft_id=info:doi\/10.1126%2Fscience.1197962&rft_id=info:pmid\/21311007&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BozemanResearch13-2\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BozemanResearch13_2-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Bozeman, B.; Fay, D.; Slade, C.P. (2013). \"Research collaboration in universities and academic entrepreneurship: the-state-of-the-art\". <i>The Journal of Technology Transfer<\/i> <b>38<\/b> (1): 1\u201367. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1007%2Fs10961-012-9281-8\" target=\"_blank\">10.1007\/s10961-012-9281-8<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Research+collaboration+in+universities+and+academic+entrepreneurship%3A+the-state-of-the-art&rft.jtitle=The+Journal+of+Technology+Transfer&rft.aulast=Bozeman%2C+B.%3B+Fay%2C+D.%3B+Slade%2C+C.P.&rft.au=Bozeman%2C+B.%3B+Fay%2C+D.%3B+Slade%2C+C.P.&rft.date=2013&rft.volume=38&rft.issue=1&rft.pages=1%E2%80%9367&rft_id=info:doi\/10.1007%2Fs10961-012-9281-8&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BrownIRB08-3\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BrownIRB08_3-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Brown, P.; Morello-Frosch, R.; Brody, J.G. (2008). <a rel=\"external_link\" class=\"external text\" href=\"https:\/\/www.brown.edu\/research\/research-ethics\/irb-challenges-multi-partner-community-based-participatory-research\" target=\"_blank\">\"IRB Challenges in Multi-Partner Community-Based Participatory Research\"<\/a>. <i>Proceedings of The American Sociological Association Annual Meeting<\/i> <b>2008<\/b>: 1-31<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"https:\/\/www.brown.edu\/research\/research-ethics\/irb-challenges-multi-partner-community-based-participatory-research\" target=\"_blank\">https:\/\/www.brown.edu\/research\/research-ethics\/irb-challenges-multi-partner-community-based-participatory-research<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=IRB+Challenges+in+Multi-Partner+Community-Based+Participatory+Research&rft.jtitle=Proceedings+of+The+American+Sociological+Association+Annual+Meeting&rft.aulast=Brown%2C+P.%3B+Morello-Frosch%2C+R.%3B+Brody%2C+J.G.&rft.au=Brown%2C+P.%3B+Morello-Frosch%2C+R.%3B+Brody%2C+J.G.&rft.date=2008&rft.volume=2008&rft.pages=1-31&rft_id=https%3A%2F%2Fwww.brown.edu%2Fresearch%2Fresearch-ethics%2Firb-challenges-multi-partner-community-based-participatory-research&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-VicensTenSimple07-4\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-VicensTenSimple07_4-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Vicens, Q.; Bourne, P.E. (2007). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1847992\" target=\"_blank\">\"Ten simple rules for a successful collaboration\"<\/a>. <i>PLOS Computational Biology<\/i> <b>3<\/b> (3): e44. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.0030044\" target=\"_blank\">10.1371\/journal.pcbi.0030044<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC1847992\/\" target=\"_blank\">PMC1847992<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/17397252\" target=\"_blank\">17397252<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1847992\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC1847992<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ten+simple+rules+for+a+successful+collaboration&rft.jtitle=PLOS+Computational+Biology&rft.aulast=Vicens%2C+Q.%3B+Bourne%2C+P.E.&rft.au=Vicens%2C+Q.%3B+Bourne%2C+P.E.&rft.date=2007&rft.volume=3&rft.issue=3&rft.pages=e44&rft_id=info:doi\/10.1371%2Fjournal.pcbi.0030044&rft_id=info:pmc\/PMC1847992&rft_id=info:pmid\/17397252&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC1847992&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-JonesMulti08-5\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-JonesMulti08_5-0\" rel=\"external_link\">5.0<\/a><\/sup> <sup><a href=\"#cite_ref-JonesMulti08_5-1\" rel=\"external_link\">5.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Jones, B.F.; Wuchty, S.; Uzzi, B. (2008). \"Multi-university research teams: shifting impact, geography, and stratification in science\". <i>Science<\/i> <b>322<\/b> (5905): 1259-62. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1126%2Fscience.1158357\" target=\"_blank\">10.1126\/science.1158357<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/18845711\" target=\"_blank\">18845711<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-university+research+teams%3A+shifting+impact%2C+geography%2C+and+stratification+in+science&rft.jtitle=Science&rft.aulast=Jones%2C+B.F.%3B+Wuchty%2C+S.%3B+Uzzi%2C+B.&rft.au=Jones%2C+B.F.%3B+Wuchty%2C+S.%3B+Uzzi%2C+B.&rft.date=2008&rft.volume=322&rft.issue=5905&rft.pages=1259-62&rft_id=info:doi\/10.1126%2Fscience.1158357&rft_id=info:pmid\/18845711&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-B.C3.B6rnerAMulti10-6\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-B.C3.B6rnerAMulti10_6-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">B\u00f6rner, K.; Contractor, N.; Falk-Krzesinski, H.J. et al. (2010). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3527819\" target=\"_blank\">\"A multi-level systems perspective for the science of team science\"<\/a>. <i>Science Translational Medicine<\/i> <b>2<\/b> (49): 49cm24. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1126%2Fscitranslmed.3001399\" target=\"_blank\">10.1126\/scitranslmed.3001399<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3527819\/\" target=\"_blank\">PMC3527819<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/20844283\" target=\"_blank\">20844283<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3527819\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3527819<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+multi-level+systems+perspective+for+the+science+of+team+science&rft.jtitle=Science+Translational+Medicine&rft.aulast=B%C3%B6rner%2C+K.%3B+Contractor%2C+N.%3B+Falk-Krzesinski%2C+H.J.+et+al.&rft.au=B%C3%B6rner%2C+K.%3B+Contractor%2C+N.%3B+Falk-Krzesinski%2C+H.J.+et+al.&rft.date=2010&rft.volume=2&rft.issue=49&rft.pages=49cm24&rft_id=info:doi\/10.1126%2Fscitranslmed.3001399&rft_id=info:pmc\/PMC3527819&rft_id=info:pmid\/20844283&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3527819&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-GottesmanTheElectronic13-7\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-GottesmanTheElectronic13_7-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Gottesman, O.; Kuivaniemi, H.; Tromp, G. et al. (2013). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3795928\" target=\"_blank\">\"The Electronic Medical Records and Genomics (eMERGE) Network: Past, present, and future\"<\/a>. <i>Genetics in Medicine<\/i> <b>15<\/b> (10): 761-71. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fgim.2013.72\" target=\"_blank\">10.1038\/gim.2013.72<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3795928\/\" target=\"_blank\">PMC3795928<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23743551\" target=\"_blank\">23743551<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3795928\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3795928<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Electronic+Medical+Records+and+Genomics+%28eMERGE%29+Network%3A+Past%2C+present%2C+and+future&rft.jtitle=Genetics+in+Medicine&rft.aulast=Gottesman%2C+O.%3B+Kuivaniemi%2C+H.%3B+Tromp%2C+G.+et+al.&rft.au=Gottesman%2C+O.%3B+Kuivaniemi%2C+H.%3B+Tromp%2C+G.+et+al.&rft.date=2013&rft.volume=15&rft.issue=10&rft.pages=761-71&rft_id=info:doi\/10.1038%2Fgim.2013.72&rft_id=info:pmc\/PMC3795928&rft_id=info:pmid\/23743551&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3795928&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-FengTheEffect16-8\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-FengTheEffect16_8-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Feng, Q.; Wei, W.Q.; Chung, C.P. et al. (2016). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4995153\" target=\"_blank\">\"The effect of genetic variation in PCSK9 on the LDL-cholesterol response to statin therapy\"<\/a>. <i>The Pharmacogenomics Journal<\/i>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Ftpj.2016.3\" target=\"_blank\">10.1038\/tpj.2016.3<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4995153\/\" target=\"_blank\">PMC4995153<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26902539\" target=\"_blank\">26902539<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4995153\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4995153<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+effect+of+genetic+variation+in+PCSK9+on+the+LDL-cholesterol+response+to+statin+therapy&rft.jtitle=The+Pharmacogenomics+Journal&rft.aulast=Feng%2C+Q.%3B+Wei%2C+W.Q.%3B+Chung%2C+C.P.+et+al.&rft.au=Feng%2C+Q.%3B+Wei%2C+W.Q.%3B+Chung%2C+C.P.+et+al.&rft.date=2016&rft_id=info:doi\/10.1038%2Ftpj.2016.3&rft_id=info:pmc\/PMC4995153&rft_id=info:pmid\/26902539&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4995153&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-KarchAlzh16-9\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-KarchAlzh16_9-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Karch, C.M.; Ezerskiy, L.A.; Bertelsen, S. et al. (2016). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4769299\" target=\"_blank\">\"Alzheimer's Disease Risk Polymorphisms Regulate Gene Expression in the ZCWPW1 and the CELF1 Loci\"<\/a>. <i>PLOS One<\/i> <b>11<\/b> (2): e0148717. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pone.0148717\" target=\"_blank\">10.1371\/journal.pone.0148717<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4769299\/\" target=\"_blank\">PMC4769299<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26919393\" target=\"_blank\">26919393<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4769299\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4769299<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Alzheimer%27s+Disease+Risk+Polymorphisms+Regulate+Gene+Expression+in+the+ZCWPW1+and+the+CELF1+Loci&rft.jtitle=PLOS+One&rft.aulast=Karch%2C+C.M.%3B+Ezerskiy%2C+L.A.%3B+Bertelsen%2C+S.+et+al.&rft.au=Karch%2C+C.M.%3B+Ezerskiy%2C+L.A.%3B+Bertelsen%2C+S.+et+al.&rft.date=2016&rft.volume=11&rft.issue=2&rft.pages=e0148717&rft_id=info:doi\/10.1371%2Fjournal.pone.0148717&rft_id=info:pmc\/PMC4769299&rft_id=info:pmid\/26919393&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4769299&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MalikLow16-10\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MalikLow16_10-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Malik, R.; Traylor, M.; Pulit, S.L. et al. (2016). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4818561\" target=\"_blank\">\"Low-frequency and common genetic variation in ischemic stroke: The METASTROKE collaboration\"<\/a>. <i>Neurology<\/i> <b>86<\/b> (13): 1217-26. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1212%2FWNL.0000000000002528\" target=\"_blank\">10.1212\/WNL.0000000000002528<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4818561\/\" target=\"_blank\">PMC4818561<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26935894\" target=\"_blank\">26935894<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4818561\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4818561<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Low-frequency+and+common+genetic+variation+in+ischemic+stroke%3A+The+METASTROKE+collaboration&rft.jtitle=Neurology&rft.aulast=Malik%2C+R.%3B+Traylor%2C+M.%3B+Pulit%2C+S.L.+et+al.&rft.au=Malik%2C+R.%3B+Traylor%2C+M.%3B+Pulit%2C+S.L.+et+al.&rft.date=2016&rft.volume=86&rft.issue=13&rft.pages=1217-26&rft_id=info:doi\/10.1212%2FWNL.0000000000002528&rft_id=info:pmc\/PMC4818561&rft_id=info:pmid\/26935894&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4818561&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-StangAdvancing10-11\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-StangAdvancing10_11-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Stang, P.E.; Ryan, P.B.; Racoosin, J.A. et al. (2010). \"Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership\". <i>Annals of Internal Medicine<\/i> <b>153<\/b> (9): 600\u20136. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.7326%2F0003-4819-153-9-201011020-00010\" target=\"_blank\">10.7326\/0003-4819-153-9-201011020-00010<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21041580\" target=\"_blank\">21041580<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Advancing+the+science+for+active+surveillance%3A+rationale+and+design+for+the+Observational+Medical+Outcomes+Partnership&rft.jtitle=Annals+of+Internal+Medicine&rft.aulast=Stang%2C+P.E.%3B+Ryan%2C+P.B.%3B+Racoosin%2C+J.A.+et+al.&rft.au=Stang%2C+P.E.%3B+Ryan%2C+P.B.%3B+Racoosin%2C+J.A.+et+al.&rft.date=2010&rft.volume=153&rft.issue=9&rft.pages=600%E2%80%936&rft_id=info:doi\/10.7326%2F0003-4819-153-9-201011020-00010&rft_id=info:pmid\/21041580&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-Prli.C4.87Ten12-12\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-Prli.C4.87Ten12_12-0\" rel=\"external_link\">12.0<\/a><\/sup> <sup><a href=\"#cite_ref-Prli.C4.87Ten12_12-1\" rel=\"external_link\">12.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Prli\u0107, A.; Procter, J.B. (2012). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3516539\" target=\"_blank\">\"Ten simple rules for the open development of scientific software\"<\/a>. <i>PLOS Computational Biology<\/i> <b>8<\/b> (12): e1002802. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.1002802\" target=\"_blank\">10.1371\/journal.pcbi.1002802<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3516539\/\" target=\"_blank\">PMC3516539<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23236269\" target=\"_blank\">23236269<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3516539\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3516539<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ten+simple+rules+for+the+open+development+of+scientific+software&rft.jtitle=PLOS+Computational+Biology&rft.aulast=Prli%C4%87%2C+A.%3B+Procter%2C+J.B.&rft.au=Prli%C4%87%2C+A.%3B+Procter%2C+J.B.&rft.date=2012&rft.volume=8&rft.issue=12&rft.pages=e1002802&rft_id=info:doi\/10.1371%2Fjournal.pcbi.1002802&rft_id=info:pmc\/PMC3516539&rft_id=info:pmid\/23236269&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3516539&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MasumTen13-13\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-MasumTen13_13-0\" rel=\"external_link\">13.0<\/a><\/sup> <sup><a href=\"#cite_ref-MasumTen13_13-1\" rel=\"external_link\">13.1<\/a><\/sup> <sup><a href=\"#cite_ref-MasumTen13_13-2\" rel=\"external_link\">13.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Masum, H.; Rao, A.; Good, B.M. et al. (2013). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3784487\" target=\"_blank\">\"Ten simple rules for cultivating open science and collaborative R&D\"<\/a>. <i>PLOS Computational Biology<\/i> <b>9<\/b> (9): e1003244. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.1003244\" target=\"_blank\">10.1371\/journal.pcbi.1003244<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3784487\/\" target=\"_blank\">PMC3784487<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24086123\" target=\"_blank\">24086123<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3784487\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3784487<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ten+simple+rules+for+cultivating+open+science+and+collaborative+R%26D&rft.jtitle=PLOS+Computational+Biology&rft.aulast=Masum%2C+H.%3B+Rao%2C+A.%3B+Good%2C+B.M.+et+al.&rft.au=Masum%2C+H.%3B+Rao%2C+A.%3B+Good%2C+B.M.+et+al.&rft.date=2013&rft.volume=9&rft.issue=9&rft.pages=e1003244&rft_id=info:doi\/10.1371%2Fjournal.pcbi.1003244&rft_id=info:pmc\/PMC3784487&rft_id=info:pmid\/24086123&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3784487&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PearlsonMultisite09-14\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-PearlsonMultisite09_14-0\" rel=\"external_link\">14.0<\/a><\/sup> <sup><a href=\"#cite_ref-PearlsonMultisite09_14-1\" rel=\"external_link\">14.1<\/a><\/sup> <sup><a href=\"#cite_ref-PearlsonMultisite09_14-2\" rel=\"external_link\">14.2<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Pearlson, G. (2009). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2643967\" target=\"_blank\">\"Multisite collaborations and large databases in psychiatric neuroimaging: Advantages, problems, and challenges\"<\/a>. <i>Schizophrenia Bulletin<\/i> <b>35<\/b> (1): 1\u20132. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1093%2Fschbul%2Fsbn166\" target=\"_blank\">10.1093\/schbul\/sbn166<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2643967\/\" target=\"_blank\">PMC2643967<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/19023121\" target=\"_blank\">19023121<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2643967\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2643967<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multisite+collaborations+and+large+databases+in+psychiatric+neuroimaging%3A+Advantages%2C+problems%2C+and+challenges&rft.jtitle=Schizophrenia+Bulletin&rft.aulast=Pearlson%2C+G.&rft.au=Pearlson%2C+G.&rft.date=2009&rft.volume=35&rft.issue=1&rft.pages=1%E2%80%932&rft_id=info:doi\/10.1093%2Fschbul%2Fsbn166&rft_id=info:pmc\/PMC2643967&rft_id=info:pmid\/19023121&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC2643967&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-WeberTheShared09-15\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-WeberTheShared09_15-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Weber, G.M.; Murphy, S.N.; McMurry, A.J. et al. (2009). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2744712\" target=\"_blank\">\"The Shared Health Research Information Network (SHRINE): A prototype federated query tool for clinical data repositories\"<\/a>. <i>JAMIA<\/i> <b>16<\/b> (5): 624-30. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1197%2Fjamia.M3191\" target=\"_blank\">10.1197\/jamia.M3191<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2744712\/\" target=\"_blank\">PMC2744712<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/19567788\" target=\"_blank\">19567788<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2744712\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2744712<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Shared+Health+Research+Information+Network+%28SHRINE%29%3A+A+prototype+federated+query+tool+for+clinical+data+repositories&rft.jtitle=JAMIA&rft.aulast=Weber%2C+G.M.%3B+Murphy%2C+S.N.%3B+McMurry%2C+A.J.+et+al.&rft.au=Weber%2C+G.M.%3B+Murphy%2C+S.N.%3B+McMurry%2C+A.J.+et+al.&rft.date=2009&rft.volume=16&rft.issue=5&rft.pages=624-30&rft_id=info:doi\/10.1197%2Fjamia.M3191&rft_id=info:pmc\/PMC2744712&rft_id=info:pmid\/19567788&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC2744712&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-MerrielBioGrid11-16\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-MerrielBioGrid11_16-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Merriel, R.B.; Gibbs, P.; O'Brien, T.J. et al. (2011). \"BioGrid Australia facilitates collaborative medical and bioinformatics research across hospitals and medical research institutes by linking data from diverse disease and data types\". <i>Human Mutation<\/i> <b>32<\/b> (5): 517-25. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1002%2Fhumu.21437\" target=\"_blank\">10.1002\/humu.21437<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21309032\" target=\"_blank\">21309032<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BioGrid+Australia+facilitates+collaborative+medical+and+bioinformatics+research+across+hospitals+and+medical+research+institutes+by+linking+data+from+diverse+disease+and+data+types&rft.jtitle=Human+Mutation&rft.aulast=Merriel%2C+R.B.%3B+Gibbs%2C+P.%3B+O%27Brien%2C+T.J.+et+al.&rft.au=Merriel%2C+R.B.%3B+Gibbs%2C+P.%3B+O%27Brien%2C+T.J.+et+al.&rft.date=2011&rft.volume=32&rft.issue=5&rft.pages=517-25&rft_id=info:doi\/10.1002%2Fhumu.21437&rft_id=info:pmid\/21309032&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BoyleBioGrid11-17\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BoyleBioGrid11_17-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Boyle, D.I.; Rafael, N. (2011). \"BioGrid Australia and GRHANITE: Privacy-protecting subject matching\". <i>Studies in Health Technology and Informatics<\/i> <b>168<\/b>: 24-34. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21893908\" target=\"_blank\">21893908<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BioGrid+Australia+and+GRHANITE%3A+Privacy-protecting+subject+matching&rft.jtitle=Studies+in+Health+Technology+and+Informatics&rft.aulast=Boyle%2C+D.I.%3B+Rafael%2C+N.&rft.au=Boyle%2C+D.I.%3B+Rafael%2C+N.&rft.date=2011&rft.volume=168&rft.pages=24-34&rft_id=info:pmid\/21893908&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-LekAnalysis16-18\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-LekAnalysis16_18-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Lek, M.; Karczewski, K.J.; Minikel, E.V. et al. (2016). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5018207\" target=\"_blank\">\"Analysis of protein-coding genetic variation in 60,706 humans\"<\/a>. <i>Nature<\/i> <b>536<\/b> (7616): 285\u201391. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnature19057\" target=\"_blank\">10.1038\/nature19057<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5018207\/\" target=\"_blank\">PMC5018207<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/27535533\" target=\"_blank\">27535533<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5018207\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC5018207<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Analysis+of+protein-coding+genetic+variation+in+60%2C706+humans&rft.jtitle=Nature&rft.aulast=Lek%2C+M.%3B+Karczewski%2C+K.J.%3B+Minikel%2C+E.V.+et+al.&rft.au=Lek%2C+M.%3B+Karczewski%2C+K.J.%3B+Minikel%2C+E.V.+et+al.&rft.date=2016&rft.volume=536&rft.issue=7616&rft.pages=285%E2%80%9391&rft_id=info:doi\/10.1038%2Fnature19057&rft_id=info:pmc\/PMC5018207&rft_id=info:pmid\/27535533&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC5018207&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-ElEmamAnon15-19\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-ElEmamAnon15_19-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">El Emam, K.; Rodgers, S.; Malin, B. (2015). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4707567\" target=\"_blank\">\"Anonymising and sharing individual patient data\"<\/a>. <i>BMJ<\/i> <b>350<\/b>: h1139. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1136%2Fbmj.h1139\" target=\"_blank\">10.1136\/bmj.h1139<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4707567\/\" target=\"_blank\">PMC4707567<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/25794882\" target=\"_blank\">25794882<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4707567\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC4707567<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Anonymising+and+sharing+individual+patient+data&rft.jtitle=BMJ&rft.aulast=El+Emam%2C+K.%3B+Rodgers%2C+S.%3B+Malin%2C+B.&rft.au=El+Emam%2C+K.%3B+Rodgers%2C+S.%3B+Malin%2C+B.&rft.date=2015&rft.volume=350&rft.pages=h1139&rft_id=info:doi\/10.1136%2Fbmj.h1139&rft_id=info:pmc\/PMC4707567&rft_id=info:pmid\/25794882&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC4707567&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SandveTen13-20\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-SandveTen13_20-0\" rel=\"external_link\">20.0<\/a><\/sup> <sup><a href=\"#cite_ref-SandveTen13_20-1\" rel=\"external_link\">20.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Sandve, G.K.; Nekrutenko, A.; Taylor, J. et al. (2013). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3812051\" target=\"_blank\">\"Ten simple rules for reproducible computational research\"<\/a>. <i>PLOS Computational Biology<\/i> <b>9<\/b> (10): e1003285. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1371%2Fjournal.pcbi.1003285\" target=\"_blank\">10.1371\/journal.pcbi.1003285<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3812051\/\" target=\"_blank\">PMC3812051<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24204232\" target=\"_blank\">24204232<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3812051\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3812051<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ten+simple+rules+for+reproducible+computational+research&rft.jtitle=PLOS+Computational+Biology&rft.aulast=Sandve%2C+G.K.%3B+Nekrutenko%2C+A.%3B+Taylor%2C+J.+et+al.&rft.au=Sandve%2C+G.K.%3B+Nekrutenko%2C+A.%3B+Taylor%2C+J.+et+al.&rft.date=2013&rft.volume=9&rft.issue=10&rft.pages=e1003285&rft_id=info:doi\/10.1371%2Fjournal.pcbi.1003285&rft_id=info:pmc\/PMC3812051&rft_id=info:pmid\/24204232&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3812051&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-BammlerStandard05-21\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-BammlerStandard05_21-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Bammler, T.; Beyer, R.P.; Bhattacharya, S. et al. (2005). \"Standardizing global gene expression analysis between laboratories and across platforms\". <i>Nature Methods<\/i> <b>2<\/b> (5): 351-6. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1038%2Fnmeth754\" target=\"_blank\">10.1038\/nmeth754<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/15846362\" target=\"_blank\">15846362<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Standardizing+global+gene+expression+analysis+between+laboratories+and+across+platforms&rft.jtitle=Nature+Methods&rft.aulast=Bammler%2C+T.%3B+Beyer%2C+R.P.%3B+Bhattacharya%2C+S.+et+al.&rft.au=Bammler%2C+T.%3B+Beyer%2C+R.P.%3B+Bhattacharya%2C+S.+et+al.&rft.date=2005&rft.volume=2&rft.issue=5&rft.pages=351-6&rft_id=info:doi\/10.1038%2Fnmeth754&rft_id=info:pmid\/15846362&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DennyPheWAS10-22\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-DennyPheWAS10_22-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Denny, J.C.; Ritchie, M.D.; Basford, M.A. et al. (2010). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2859132\" target=\"_blank\">\"PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations\"<\/a>. <i>Bioinformatics<\/i> <b>26<\/b> (9): 1205-10. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1093%2Fbioinformatics%2Fbtq126\" target=\"_blank\">10.1093\/bioinformatics\/btq126<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2859132\/\" target=\"_blank\">PMC2859132<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/20335276\" target=\"_blank\">20335276<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2859132\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2859132<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PheWAS%3A+demonstrating+the+feasibility+of+a+phenome-wide+scan+to+discover+gene-disease+associations&rft.jtitle=Bioinformatics&rft.aulast=Denny%2C+J.C.%3B+Ritchie%2C+M.D.%3B+Basford%2C+M.A.+et+al.&rft.au=Denny%2C+J.C.%3B+Ritchie%2C+M.D.%3B+Basford%2C+M.A.+et+al.&rft.date=2010&rft.volume=26&rft.issue=9&rft.pages=1205-10&rft_id=info:doi\/10.1093%2Fbioinformatics%2Fbtq126&rft_id=info:pmc\/PMC2859132&rft_id=info:pmid\/20335276&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC2859132&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-NewtonValid13-23\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-NewtonValid13_23-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Newton, K.M.; Peissig, P.L.; Kho, A.N. et al. (2013). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3715338\" target=\"_blank\">\"Validation of electronic medical record-based phenotyping algorithms: Results and lessons learned from the eMERGE network\"<\/a>. <i>JAMIA<\/i> <b>20<\/b> (e1): e147-54. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1136%2Famiajnl-2012-000896\" target=\"_blank\">10.1136\/amiajnl-2012-000896<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3715338\/\" target=\"_blank\">PMC3715338<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/23531748\" target=\"_blank\">23531748<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3715338\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3715338<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Validation+of+electronic+medical+record-based+phenotyping+algorithms%3A+Results+and+lessons+learned+from+the+eMERGE+network&rft.jtitle=JAMIA&rft.aulast=Newton%2C+K.M.%3B+Peissig%2C+P.L.%3B+Kho%2C+A.N.+et+al.&rft.au=Newton%2C+K.M.%3B+Peissig%2C+P.L.%3B+Kho%2C+A.N.+et+al.&rft.date=2013&rft.volume=20&rft.issue=e1&rft.pages=e147-54&rft_id=info:doi\/10.1136%2Famiajnl-2012-000896&rft_id=info:pmc\/PMC3715338&rft_id=info:pmid\/23531748&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3715338&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-PathakMapping11-24\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-PathakMapping11_24-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Pathak, J.; Wang, J.; Kashyap, S. et al. (2011). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3128396\" target=\"_blank\">\"Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: The eMERGE Network experience\"<\/a>. <i>JAMIA<\/i> <b>18<\/b> (4): 376-86. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1136%2Famiajnl-2010-000061\" target=\"_blank\">10.1136\/amiajnl-2010-000061<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3128396\/\" target=\"_blank\">PMC3128396<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21597104\" target=\"_blank\">21597104<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3128396\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC3128396<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Mapping+clinical+phenotype+data+elements+to+standardized+metadata+repositories+and+controlled+terminologies%3A+The+eMERGE+Network+experience&rft.jtitle=JAMIA&rft.aulast=Pathak%2C+J.%3B+Wang%2C+J.%3B+Kashyap%2C+S.+et+al.&rft.au=Pathak%2C+J.%3B+Wang%2C+J.%3B+Kashyap%2C+S.+et+al.&rft.date=2011&rft.volume=18&rft.issue=4&rft.pages=376-86&rft_id=info:doi\/10.1136%2Famiajnl-2010-000061&rft_id=info:pmc\/PMC3128396&rft_id=info:pmid\/21597104&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC3128396&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-DienerPersonality03-25\"><span class=\"mw-cite-backlink\">\u2191 <sup><a href=\"#cite_ref-DienerPersonality03_25-0\" rel=\"external_link\">25.0<\/a><\/sup> <sup><a href=\"#cite_ref-DienerPersonality03_25-1\" rel=\"external_link\">25.1<\/a><\/sup><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Diener, E.; Oishi, S.; Lucas, R.E. (2003). \"Personality, culture, and subjective well-being: Emotional and cognitive evaluations of life\". <i>Annual Review of Psychology<\/i> <b>54<\/b>: 403-25. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1146%2Fannurev.psych.54.101601.1450561\" target=\"_blank\">10.1146\/annurev.psych.54.101601.1450561<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/12172000\" target=\"_blank\">12172000<\/a>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Personality%2C+culture%2C+and+subjective+well-being%3A+Emotional+and+cognitive+evaluations+of+life&rft.jtitle=Annual+Review+of+Psychology&rft.aulast=Diener%2C+E.%3B+Oishi%2C+S.%3B+Lucas%2C+R.E.&rft.au=Diener%2C+E.%3B+Oishi%2C+S.%3B+Lucas%2C+R.E.&rft.date=2003&rft.volume=54&rft.pages=403-25&rft_id=info:doi\/10.1146%2Fannurev.psych.54.101601.1450561&rft_id=info:pmid\/12172000&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HFD-26\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HFD_26-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.humanfertility.org\/cgi-bin\/main.php\" target=\"_blank\">\"The Human Fertility Database\"<\/a>. Max Planck Institute for Demographic Research and Vienna Institute of Demography<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.humanfertility.org\/cgi-bin\/main.php\" target=\"_blank\">http:\/\/www.humanfertility.org\/cgi-bin\/main.php<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 06 October 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=The+Human+Fertility+Database&rft.atitle=&rft.pub=Max+Planck+Institute+for+Demographic+Research+and+Vienna+Institute+of+Demography&rft_id=http%3A%2F%2Fwww.humanfertility.org%2Fcgi-bin%2Fmain.php&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-HMD-27\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-HMD_27-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation web\"><a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.mortality.org\/\" target=\"_blank\">\"The Human Mortality Database\"<\/a>. University of California, Berkeley and Max Planck Institute for Demographic Research<span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.mortality.org\/\" target=\"_blank\">http:\/\/www.mortality.org\/<\/a><\/span><span class=\"reference-accessdate\">. Retrieved 06 October 2016<\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=The+Human+Mortality+Database&rft.atitle=&rft.pub=University+of+California%2C+Berkeley+and+Max+Planck+Institute+for+Demographic+Research&rft_id=http%3A%2F%2Fwww.mortality.org%2F&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<li id=\"cite_note-SennOverstating09-28\"><span class=\"mw-cite-backlink\"><a href=\"#cite_ref-SennOverstating09_28-0\" rel=\"external_link\">\u2191<\/a><\/span> <span class=\"reference-text\"><span class=\"citation Journal\">Senn, S.J. (2009). <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2653069\" target=\"_blank\">\"Overstating the evidence: Double counting in meta-analysis and related problems\"<\/a>. <i>BMC Medical Research Methodology<\/i> <b>9<\/b>: 10. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/Digital_object_identifier\" target=\"_blank\">doi<\/a>:<a rel=\"external_link\" class=\"external text\" href=\"http:\/\/dx.doi.org\/10.1186%2F1471-2288-9-10\" target=\"_blank\">10.1186\/1471-2288-9-10<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Central\" target=\"_blank\">PMC<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC2653069\/\" target=\"_blank\">PMC2653069<\/a>. <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/en.wikipedia.org\/wiki\/PubMed_Identifier\" target=\"_blank\">PMID<\/a> <a rel=\"external_link\" class=\"external text\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/19216779\" target=\"_blank\">19216779<\/a><span class=\"printonly\">. <a rel=\"external_link\" class=\"external free\" href=\"http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2653069\" target=\"_blank\">http:\/\/www.pubmedcentral.nih.gov\/articlerender.fcgi?tool=pmcentrez&artid=PMC2653069<\/a><\/span>.<\/span><span class=\"Z3988\" title=\"ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Overstating+the+evidence%3A+Double+counting+in+meta-analysis+and+related+problems&rft.jtitle=BMC+Medical+Research+Methodology&rft.aulast=Senn%2C+S.J.&rft.au=Senn%2C+S.J.&rft.date=2009&rft.volume=9&rft.pages=10&rft_id=info:doi\/10.1186%2F1471-2288-9-10&rft_id=info:pmc\/PMC2653069&rft_id=info:pmid\/19216779&rft_id=http%3A%2F%2Fwww.pubmedcentral.nih.gov%2Farticlerender.fcgi%3Ftool%3Dpmcentrez%26artid%3DPMC2653069&rfr_id=info:sid\/en.wikipedia.org:Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\"><span style=\"display: none;\"> <\/span><\/span><\/span>\n<\/li>\n<\/ol><\/div>\n<h2><span class=\"mw-headline\" id=\"Notes\">Notes<\/span><\/h2>\n<p>This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.\n<\/p>\n<!-- \nNewPP limit report\nCached time: 20181214191056\nCache expiry: 86400\nDynamic content: false\nCPU time usage: 0.712 seconds\nReal time usage: 0.747 seconds\nPreprocessor visited node count: 23903\/1000000\nPreprocessor generated node count: 34169\/1000000\nPost\u2010expand include size: 216456\/2097152 bytes\nTemplate argument size: 69003\/2097152 bytes\nHighest expansion depth: 18\/40\nExpensive parser function count: 0\/100\n-->\n\n<!-- \nTransclusion expansion time report (%,ms,calls,template)\n100.00% 713.401 1 - -total\n 85.77% 611.852 1 - Template:Reflist\n 75.28% 537.075 28 - Template:Citation\/core\n 75.04% 535.350 26 - Template:Cite_journal\n 11.97% 85.400 65 - Template:Citation\/identifier\n 8.97% 63.997 1 - Template:Infobox_journal_article\n 8.58% 61.180 1 - Template:Infobox\n 5.05% 36.023 80 - Template:Infobox\/row\n 4.45% 31.749 2 - Template:Cite_web\n 4.35% 31.059 28 - Template:Citation\/make_link\n-->\n\n<!-- Saved in parser cache with key limswiki:pcache:idhash:9968-0!*!0!!en!5!* and timestamp 20181214191055 and revision id 29514\n -->\n<\/div><div class=\"printfooter\">Source: <a rel=\"external_link\" class=\"external\" href=\"https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing\">https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing<\/a><\/div>\n\t\t\t\t\t\t\t\t\t\t<!-- end content -->\n\t\t\t\t\t\t\t\t\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t\t<!-- end of the left (by default at least) column -->\n\t\t<div class=\"visualClear\"><\/div>\n\t\t\t\t\t\n\t\t<\/div>\n\t\t\n\n<\/body>","0efba51aeff20a2591887ad29fac5866_images":["https:\/\/www.limswiki.org\/images\/d\/d1\/Fig1_Boland_PLOSCompBio2017_13-1.png","https:\/\/www.limswiki.org\/images\/0\/07\/Tab1_Boland_PLOSCompBio2017_13-1.png"],"0efba51aeff20a2591887ad29fac5866_timestamp":1544814655,"5558150b977a44d9e5f293e9ae7e49a1":{"type":"chapter","title":"1. Big data, informatics, and research","key":"5558150b977a44d9e5f293e9ae7e49a1"}},"link":"https:\/\/www.limswiki.org\/index.php\/Book:LIMSjournal_-_Spring_2017","price_currency":"","price_amount":"","book_size":"","download_url":"https:\/\/www.limsforum.com?ebb_action=book_download&book_id=78057","language":"","cta_button_content":"","toc":[{"type":"chapter","name":"1. Big data, informatics, and research","id":"5558150b977a44d9e5f293e9ae7e49a1","children":[{"type":"article","name":"Ten simple rules to enable multi-site collaborations through data sharing (Boland et al. 2017)","id":"0efba51aeff20a2591887ad29fac5866","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_to_enable_multi-site_collaborations_through_data_sharing"},{"type":"article","name":"A metadata-driven approach to data repository design (Harvey et al. 2017)","id":"0c7c45ef71cf479715ea32203f1e26d3","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:A_metadata-driven_approach_to_data_repository_design"},{"type":"article","name":"Data and metadata brokering \u2013 Theory and practice from the BCube Project (Khalsa 2017)","id":"e80be5db806b508a1aecd418f32667db","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Data_and_metadata_brokering_%E2%80%93_Theory_and_practice_from_the_BCube_Project"},{"type":"article","name":"Ten simple rules for cultivating open science and collaborative R&D (Masum et al. 2013)","id":"b1b2d2922d12d6afbd23ca5f216a0cd7","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_for_cultivating_open_science_and_collaborative_R%26D"}]},{"type":"chapter","name":"2. Bioinformatics","id":"77fbf09bb35e82206c113e9ae59b1b18","children":[{"type":"article","name":"PCM-SABRE: A platform for benchmarking and comparing outcome prediction methods in precision cancer medicine (Eyal-Altman et al. 2017)","id":"ffcad3b9d842250ab55f35eb0cee8237","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:PCM-SABRE:_A_platform_for_benchmarking_and_comparing_outcome_prediction_methods_in_precision_cancer_medicine"},{"type":"article","name":"SCIFIO: An extensible framework to support scientific image formats (Hiner et al. 2017)","id":"e28b66162eedd9f2c9137d1be8322cec","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:SCIFIO:_An_extensible_framework_to_support_scientific_image_formats"},{"type":"article","name":"Ten simple rules for developing usable software in computational biology (List et al. 2017)","id":"489049f69ab6d4b2f19ec2a155d44c4e","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Ten_simple_rules_for_developing_usable_software_in_computational_biology"},{"type":"article","name":"DGW: An exploratory data analysis tool for clustering and visualisation of epigenomic marks (Lukauskas et al. 2016)","id":"5321dee46dc24114d97002f69139f201","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:DGW:_An_exploratory_data_analysis_tool_for_clustering_and_visualisation_of_epigenomic_marks"},{"type":"article","name":"Use of application containers and workflows for genomic data analysis (Schulz et al. 2016)","id":"18474356308b22be86d3205a31b5a267","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Use_of_application_containers_and_workflows_for_genomic_data_analysis"}]},{"type":"chapter","name":"3. Health, public health, and clinical informatics","id":"e20bd57b1986b44204a29c8419326fe3","children":[{"type":"article","name":"Informatics metrics and measures for a smart public health systems approach: Information science perspective (Carney and Shea 2017)","id":"bfe42513d857c82a22a78dbd758fc186","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Informatics_metrics_and_measures_for_a_smart_public_health_systems_approach:_Information_science_perspective"},{"type":"article","name":"Deployment of analytics into the healthcare safety net: Lessons learned (Hartzband and Jacobs 2016)","id":"bbfbe3553b26be64d63e45d26612ea45","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:Deployment_of_analytics_into_the_healthcare_safety_net:_Lessons_learned"},{"type":"article","name":"The effect of the General Data Protection Regulation on medical research (Rumbold and Pierscionek 2017)","id":"35171859a8e80fe1a0d916059f4fdd3e","pageUrl":"https:\/\/www.limswiki.org\/index.php\/Journal:The_effect_of_the_General_Data_Protection_Regulation_on_medical_research"}]}],"settings":{"show_cover":"1","show_title":"1","show_subtitle":"0","show_full_title":"1","show_editor":"1","show_editor_pic":"1","show_publisher":"1","show_language":"1","show_size":"1","show_toc":"1","show_content_beneath_cover":"1","cta_button":"1","content_location":"1","toc_links":"disabled","log_in_msg":"<span><\/span> Please log in to read online.","cover_size":"medium"},"title_image":"https:\/\/s3.limsforum.com\/www.limsforum.com\/wp-content\/uploads\/Fig1_Boland_PLOSCompBio2017_13-1.png"}}
LIMSjournal - Spring 2017
Volume 3, Issue 1
Editor: Shawn Douglas
Publisher: LabLynx Press
Copyright LabLynx Inc. All rights reserved.