Research: Scientific Knowledge Extraction from Data (SKED)


The SKED architecture is a technology-agnostic framework to minimize the overhead of data integration, facilitate the reuse of analytical pipelines, and guarantee of reproducibility of quantitative results.‚Äč

SKED is a a general, extensible solution that integrates and homogenizes data of disparate origin, incompatible formats, and multiple spatial and temporal scales.

Data primitives are atomic units of data representation which are independent of the underlying storage strategy. This reduced set of data types, time series, text, graphs and polygonal meshes, enables data to be used like LEGO® building blocks rather than puzzle pieces for analysis.

The SKED Relational schema provides a way to store data primitives and their associated metadata an experiment. The data primitives associated with an experiment can thus be easily accessed and retrieved.


Gold standard datasets provide the community with data of known properties for the development and verification of new analytical methods. For multi-omic systems biology studies involving the host-parasite interaction, we use the model system developed by the MaHPIC consortium of non-human primates infected with malaria.


With SKED, communication across systems is standardized by the use of data primitives. Our vision for the Resource Allocation Service (RAS) is a server set that contains a database of all known resources, e.g. datasets, pipelines, etc. RAS locates and allocates those requested resources. For each resource in the database there should be information about the address, available capacity, and share-ability.