Publishing Data¶
DataJoint is a framework for building data pipelines that support rigorous flow of
structured data between experimenters, data scientists, and computing agents during
data acquisition and processing within a centralized project.
Publishing final datasets for the outside world may require additional steps and
conversion.
Provide access to a DataJoint server¶
One approach for publishing data is to grant public access to an existing pipeline.
Then public users will be able to query the data pipelines using DataJoint's query
language and output interfaces just like any other users of the pipeline.
For security, this may require synchronizing the data onto a separate read-only public
server.
Containerizing as a DataJoint pipeline¶
Containerization platforms such as Docker allow convenient
distribution of environments including database services and data.
It is convenient to publish DataJoint pipelines as a docker container that deploys the
populated DataJoint pipeline.
One example of publishing a DataJoint pipeline as a docker container is
Sinz, F., Ecker, A.S., Fahey, P., Walker, E., Cobos, E., Froudarakis, E., Yatsenko, D., Pitkow, Z., Reimer, J. and Tolias, A., 2018. Stimulus domain transfer in recurrent models for large scale cortical population prediction on video. In Advances in Neural Information Processing Systems (pp. 7198-7209). https://www.biorxiv.org/content/early/2018/10/25/452672
The code and the data can be found at https://github.com/sinzlab/Sinz2018_NIPS
Exporting into a collection of files¶
Another option for publishing and archiving data is to export the data from the DataJoint pipeline into a collection of files. DataJoint provides features for exporting and importing sections of the pipeline. Several ongoing projects are implementing the capability to export from DataJoint pipelines into Neurodata Without Borders files.