A data science program cannot be competitive without access to rich data sources and informatics resources that enable rapid identification, acquisition, integration, and delivery of data sets that may be massive in size, diverse in structure, including unstructured text document, and complex in content, semantics, terminologies, and quality. A large-scale, flexible data infrastructure that leverages both traditional and next-generation data environments and tools must be implemented and maintained as a core service asset and capability of D2V. While some elements of this infrastructure are being implemented within the Colorado Center for Personalized Medicine (CCPM) and can be leveraged, D2V will require specialized mission-specific resources, such as advanced record linkage methods, to service the innovative data needs and use of D2V investigators
The D2V Informatics Core partners with Health Data Compass (HDC)
to use common platforms and support structures. D2V will leverage the substantial existing institutional investments in HDC’s HIPAA-compliant data center, common resources, data-sharing policies, procedures and regulatory framework. Unlike HDC, D2V will implement novel data architectures and informatics methods to remain on the “bleeding edge” of Big Data management.
This core will provide tools and expertise in:
- Data management and curation,
- Data discovery,
- Data integration and person level linkages,
- Federated data sharing methods/technologies,
- Data/metadata repositories, analytic code, software, and results for data visualization, interpretation, and reuse., and
- Support of open science and research reproducibility.