NAV
ocarina uploader python

Introduction

Malleable All-seeing Journal Of Research Artifacts

Majora is a Django-based wet-and-dry information management system. Majora is being rapidly developed as part of the COVID-19 Genomics UK Consortium (COG-UK) response to the outbreak of SARS-CoV-2.

Majora is a system that stores metadata on biological samples, sequencing runs, bioinformatics pipelines and files. These different items are referred to generally, as "artifacts". Majora is composed of three main parts:

This documentation attempts to cover all bases by showing all the fields for each of the artifacts and processes that can be added, updated and retrieved from Majora. Although intended primarily for users who wish to write a computer program to use the API or users of the Ocarina command line tool, it should be useful for users of the CGPS metadata uploader. Users of the uploader will likely also want to refer to the documentation for the metadata uploader.

You may be interested to know that this API documentation page was created with Slate.

Important notes

Authentication

Biosamples

Add one or more biosamples to Majora

/artifact/biosample/add/

Attributes

Minimal Ocarina command with mandatory parameters:

ocarina put biosample \
    --adm1 UK-ENG \
    --central-sample-id BIRM-12345 \
    --collection-date 2020-06-03 \
    --is-surveillance Y 

Full Ocarina command example:

ocarina put biosample \
    --adm1 UK-ENG \
    --central-sample-id BIRM-12345 \
    --collection-date 2020-06-03 \
    --is-surveillance Y \
    --received-date 2020-06-04 \
    --adm2 Birmingham \
    --source-age 29 \
    --source-sex F \
    --adm2-private B20 \
    --biosample-source-id ABC12345 \
    --collecting-org 'Hypothetical University of Hooting' \
    --collection-pillar 2 \
    --root-sample-id PHA12345 \
    --sample-type-collected swab \
    --sample-type-received primary \
    --sender-sample-id LAB12345 \
    --swab-site nose-throat 

Attributes currently unsupported by Ocarina: admitted_date, admitted_hospital_name, admitted_hospital_trust_or_board, admitted_with_covid_diagnosis, anonymised_care_home_code, employing_hospital_name, employing_hospital_trust_or_board, is_care_home_resident, is_care_home_worker, is_hcw, is_hospital_patient, is_icu_patient

Function not currently implemented in Ocarina Python API

Documentation for this function can be found on the CGPS uploader website linked below:
https://metadata.docs.cog-uk.io/bulk-upload-1/bulk-upload

There may be some differences between this specification and the uploader, particularly for providing Metrics and Metadata. See the Metadata and Metrics sections below for column names that are compatible with the API spec.

Name Description Options
adm1
string, required, enum
Code of UK home nation of the patient from which the sample was collected
  • UK-ENG
  • UK-SCT
  • UK-WLS
  • UK-NIR
central_sample_id
string, required
The centrally shared ID that you will use to refer to this sample inside the consortium.
    collection_date
    string, required
    Provide where possible. When collection_date cannot be provided, you must provide received_date instead.
      is_surveillance
      string, required, enum
      Whether this sample was collected under the COGUK surveillance protocol.
      • Y
      • N
      received_date
      string, possibly required
      Date sample was first received by any lab. This date should be as close to possible to collection_date. This date must be provided if collection_date is missing.
        adm2
        string, recommended
        The city or county that the patient lives in (avoid abbreviations or short hand)
          source_age
          integer, recommended
          Ages should be whole numbers. Neonatals should be entered as 0.
            source_sex
            string, recommended, enum
            • F
            • M
            • Other
            adm2_private
            string
            The outer postcode for the patient's home address (first half of the postcode only)
              admitted_date
              string
              If is_hospital_patient, the date (YYYY-MM-DD) that the patient was admitted to hospital
                admitted_hospital_name
                string
                If is_hospital_patient, provide the name of the hospital. If you do not know the name, use HOSPITAL
                  admitted_hospital_trust_or_board
                  string
                  If is_hospital_patient, provide the name of the trust or board that administers the hospital the patient was admitted to.
                    admitted_with_covid_diagnosis
                    string, enum
                    If is_hospital_patient, whether the patient was admitted with a COVID diagnosis
                    • Y
                    • N
                    • (blank)
                    anonymised_care_home_code
                    string
                    A code to represent a particular care home, the mapping of this code to the care home should be kept securely by your organisation. You must take care to select a code that can not link the identity of the care home.
                      biosample_source_id
                      string
                      A unique identifier of patient or environmental sample. If you have multiple samples from the same patient, enter the FIRST central_sample_id assigned to one of their samples here.
                        collecting_org
                        string
                        The site (eg. hospital or surgery) that this sample was originally collected by.
                          collection_pillar
                          integer
                          The pillar under which this sample was collected (e.g. 1, 2). This is likely 1, but leave blank if unsure.
                            employing_hospital_name
                            string
                            If is_hcw, provide the name of the employing hospital. If you do not know the name, use HOSPITAL
                              employing_hospital_trust_or_board
                              string
                              If is_hcw, provide the name of the employing trust or board.
                                is_care_home_resident
                                string, enum
                                • Y
                                • N
                                • (blank)
                                is_care_home_worker
                                string, enum
                                • Y
                                • N
                                • (blank)
                                is_hcw
                                string, enum
                                Whether the sample was collected from a healthcare worker. This includes hospital-associated workers.
                                • Y
                                • N
                                • (blank)
                                is_hospital_patient
                                string, enum
                                • Y
                                • N
                                • (blank)
                                is_icu_patient
                                string, enum
                                • Y
                                • N
                                • (blank)
                                root_sample_id
                                string
                                Identifier assigned to this sample from one of the health agencies (eg. PHE samples will be prefixed with H20). This is necessary for linking samples to private patient metadata later.
                                  sample_type_collected
                                  string, enum
                                  • dry swab
                                  • swab
                                  • sputum
                                  • BAL
                                  • aspirate
                                  sample_type_received
                                  string, enum
                                  • primary
                                  • extract
                                  • culture
                                  • lysate
                                  sender_sample_id
                                  string
                                  If you are permitted, provide the identifier that was sent by your laboratory to SGSS here.
                                    swab_site
                                    string, enum
                                    Required if sample_type_collected is swab
                                    • nose
                                    • throat
                                    • nose-throat
                                    • endotracheal
                                    • rectal

                                    Metrics

                                    To provide metrics with Ocarina:

                                    ocarina put biosample \
                                        ...
                                        --metric ct.# ct_value 25 \
                                        --metric ct.# test_kit INHOUSE \
                                        --metric ct.# test_platform INHOUSE \
                                        --metric ct.# test_target ORF8 
                                    

                                    If a particular metric supports storing multiple records, you can provide them by incrementing a numerical suffix after the metric's namespace: e.g. --metric name.1 key value ... --metric name.N key value.

                                    Some metrics can be provided via the uploader using these column names:

                                    • ct ct_valuect_#_ct_value (limit 2)
                                    • ct test_kitct_#_test_kit (limit 2)
                                    • ct test_platformct_#_test_platform (limit 2)
                                    • ct test_targetct_#_test_target (limit 2)

                                    Some artifacts in Majora can be annotated with additional Metric objects. Metric objects group together specific information that allows for additional description of an artifact, but does not belong in the artifact itself. Each metric has its own namespace, containing a fixed set of keys. Some or all of the keys may need a value to validate the Metric. This endpoint allows you to submit the following Metrics:

                                    Namespace Name Description Options
                                    ct ct_value Cycle threshold value. Cannot be negative. Code an inconclusive or negative test as 0.
                                      ct test_kit
                                      • ALTONA
                                      • ABBOTT
                                      • AUSDIAGNOSTICS
                                      • BOSPHORE
                                      • ROCHE
                                      • INHOUSE
                                      • SEEGENE
                                      • VIASURE
                                      • BD
                                      • XPERT
                                      • QIASTAT
                                      • ALINITY
                                      • AMPLIDIAG
                                      • (blank)
                                      ct test_platform
                                      • ALTOSTAR_AM16
                                      • ABBOTT_M2000
                                      • ABBOTT_ALINITY
                                      • APPLIED_BIO_7500
                                      • ROCHE_COBAS
                                      • ROCHE_FLOW
                                      • ROCHE_LIGHTCYCLER
                                      • ELITE_INGENIUS
                                      • CEPHEID_XPERT
                                      • QIASTAT_DX
                                      • AUSDIAGNOSTICS
                                      • INHOUSE
                                      • ALTONA
                                      • PANTHER
                                      • SEEGENE_NIMBUS
                                      • QIAGEN_ROTORGENE
                                      • BD_MAX
                                      • AMPLIDIAG_EASY
                                      • (blank)
                                      ct test_target
                                      • E
                                      • N
                                      • S
                                      • RDRP
                                      • ORF1AB
                                      • ORF8
                                      • RDRP+N
                                      • (blank)

                                      Metadata

                                      To provide metadata with Ocarina:

                                      ocarina put biosample \
                                          ...
                                          -m epi cluster CLUSTER8 \
                                          -m investigation cluster 'Ward 0' \
                                          -m investigation name 'West Midlands HCW' \
                                          -m investigation site QEHB 
                                      

                                      Some metadata can be provided via the uploader using these column names:

                                      • epi clusterepi_cluster
                                      • investigation clusterinvestigation_cluster
                                      • investigation nameinvestigation_name
                                      • investigation siteinvestigation_site

                                      Any artifact in Majora can be 'tagged' with arbitrary key-value metadata. Unlike Metrics, there is no fixed terminology or validation on the keys or their values. Like Metrics, to aid organisation, metadata keys are grouped into namespaces. This endpoint has 'reserved' metadata keys that should only be used to provide meaningful information:

                                      Namespace Name Description Options
                                      epi epi_cluster A local identifier for a known case cluster
                                        investigation investigation_cluster An optional identifier for a cluster within an investigation
                                          investigation investigation_name A named investigation (eg. a surveillance or directed case group)
                                            investigation investigation_site An optional site name or code to differentiate between sites if the investigation covers more than one site.

                                              Library

                                              Add a sequencing library to Majora

                                              /artifact/library/add/

                                              Attributes

                                              Minimal Ocarina command with mandatory parameters:

                                              ocarina put library \
                                                  --biosample BIRM-12345 VIRAL_RNA PCR AMPLICON 'ARTIC v3 (LoCost)' 'ARTIC v3' \
                                                  --library-layout-config PAIRED \
                                                  --library-name HOOT-LIBRARY-20200322 \
                                                  --library-seq-kit 'Illumina MiSeq v3' \
                                                  --library-seq-protocol 'MiSeq 150 Cycle' 
                                              

                                              Full Ocarina command example:

                                              ocarina put library \
                                                  --biosample BIRM-12345 VIRAL_RNA PCR AMPLICON 'ARTIC v3 (LoCost)' 'ARTIC v3' \
                                                  --library-layout-config PAIRED \
                                                  --library-name HOOT-LIBRARY-20200322 \
                                                  --library-seq-kit 'Illumina MiSeq v3' \
                                                  --library-seq-protocol 'MiSeq 150 Cycle' \
                                                  --library-layout-insert-length 100 \
                                                  --library-layout-read-length 300 \
                                                  --sequencing-org-received-date 2021-01-14 
                                              

                                              Attributes merged into positional arguments by Ocarina:

                                              • biosamplecentral_sample_id library_source library_selection library_strategy library_protocol library_primers

                                              Attributes currently unsupported by Ocarina: barcode

                                              Function not currently implemented in Ocarina Python API

                                              Documentation for this function can be found on the CGPS uploader website linked below:
                                              https://metadata.docs.cog-uk.io/bulk-upload-1/samples-and-sequencing

                                              There may be some differences between this specification and the uploader, particularly for providing Metrics and Metadata. See the Metadata and Metrics sections below for column names that are compatible with the API spec.

                                              Some attributes are named differently on the CGPS uploader:

                                              • library_primersartic_primers
                                              • library_protocolartic_protocol
                                              Name Description Options
                                              central_sample_id
                                              string, required
                                                library_layout_config
                                                string, required, enum
                                                • SINGLE
                                                • PAIRED
                                                library_name
                                                string, required
                                                A unique, somewhat memorable name for your library.
                                                  library_selection
                                                  string, required, enum
                                                  • RANDOM
                                                  • PCR
                                                  • RANDOM_PCR
                                                  • OTHER
                                                  library_seq_kit
                                                  string, required
                                                    library_seq_protocol
                                                    string, required
                                                      library_source
                                                      string, required, enum
                                                      • GENOMIC
                                                      • TRANSCRIPTOMIC
                                                      • METAGENOMIC
                                                      • METATRANSCRIPTOMIC
                                                      • VIRAL_RNA
                                                      • OTHER
                                                      library_strategy
                                                      string, required, enum
                                                      • WGA
                                                      • WGS
                                                      • AMPLICON
                                                      • TARGETED_CAPTURE
                                                      • OTHER
                                                      library_primers
                                                      string, recommended
                                                        library_protocol
                                                        string, recommended
                                                          barcode
                                                          string
                                                            library_layout_insert_length
                                                            integer
                                                              library_layout_read_length
                                                              integer
                                                                sequencing_org_received_date
                                                                string
                                                                Date sample was received by the organisation which sequenced it. This date is used for tracking sample turnaround time.

                                                                  Metadata

                                                                  To provide metadata with Ocarina:

                                                                  ocarina put library \
                                                                      ...
                                                                      -m artic primers 3 \
                                                                      -m artic protocol 'v3 (LoCost)' 
                                                                  

                                                                  Some metadata can be provided via the uploader using these column names:

                                                                  • artic primersartic_primers
                                                                  • artic protocolartic_protocol

                                                                  Any artifact in Majora can be 'tagged' with arbitrary key-value metadata. Unlike Metrics, there is no fixed terminology or validation on the keys or their values. Like Metrics, to aid organisation, metadata keys are grouped into namespaces. This endpoint has 'reserved' metadata keys that should only be used to provide meaningful information:

                                                                  Namespace Name Description Options
                                                                  artic artic_primers The version number of the ARTIC primer set (if used) to prepare this library
                                                                    artic artic_protocol The version number of the ARTIC protocol (if used) to prepare this library

                                                                      Sequencing

                                                                      Add a sequencing run to Majora

                                                                      /process/sequencing/add/

                                                                      Attributes

                                                                      Minimal Ocarina command with mandatory parameters:

                                                                      ocarina put sequencing \
                                                                          --instrument-make ILLUMINA \
                                                                          --instrument-model MiSeq \
                                                                          --library-name HOOT-LIBRARY-20200322 \
                                                                          --run-name YYMMDD_AB000000_1234_ABCDEFGHI0 
                                                                      

                                                                      Full Ocarina command example:

                                                                      ocarina put sequencing \
                                                                          --instrument-make ILLUMINA \
                                                                          --instrument-model MiSeq \
                                                                          --library-name HOOT-LIBRARY-20200322 \
                                                                          --run-name YYMMDD_AB000000_1234_ABCDEFGHI0 \
                                                                          --bioinfo-pipe-name 'ARTIC Pipeline (iVar)' \
                                                                          --bioinfo-pipe-version 1.3.0 \
                                                                          --end-time 'YYYY-MM-DD HH:MM' \
                                                                          --flowcell-id ABCDEF \
                                                                          --flowcell-type v3 \
                                                                          --start-time 'YYYY-MM-DD HH:MM' 
                                                                      

                                                                      Function not currently implemented in Ocarina Python API

                                                                      Documentation for this function can be found on the CGPS uploader website linked below:
                                                                      https://metadata.docs.cog-uk.io/bulk-upload-1/samples-and-sequencing

                                                                      There may be some differences between this specification and the uploader, particularly for providing Metrics and Metadata. See the Metadata and Metrics sections below for column names that are compatible with the API spec.

                                                                      Name Description Options
                                                                      instrument_make
                                                                      string, required, enum
                                                                      • ILLUMINA
                                                                      • OXFORD_NANOPORE
                                                                      • PACIFIC_BIOSCIENCES
                                                                      instrument_model
                                                                      string, required
                                                                        library_name
                                                                        string, required
                                                                        The name of the library as submitted to add_library
                                                                          run_name
                                                                          string, required
                                                                          A unique name that corresponds to your run. Ideally, use the name generated by your sequencing instrument.
                                                                            bioinfo_pipe_name
                                                                            string, recommended
                                                                            The name of the bioinformatics pipeline used for downstream analysis of this run
                                                                              bioinfo_pipe_version
                                                                              string, recommended
                                                                              The version number of the bioinformatics pipeline used for downstream analysis of this run
                                                                                end_time
                                                                                string
                                                                                  flowcell_id
                                                                                  string
                                                                                    flowcell_type
                                                                                    None
                                                                                      start_time
                                                                                      string

                                                                                        Errors

                                                                                        The Majora API uses the following error codes:

                                                                                        Error Code Meaning
                                                                                        400 Bad Request -- Your request is invalid or unauthorized (Majora never sends a 401).
                                                                                        403 Forbidden -- You are not permitted to make this request.
                                                                                        404 Not Found -- Your requested Artifact or Process could not be found.
                                                                                        429 Too Many Requests -- You're requesting too many resources, try adding a small delay between queries.
                                                                                        500 Internal Server Error -- Your action generated an error. Try again later. If the error persists, report to an administrator.
                                                                                        503 Service Unavailable -- We're temporarily offline for maintenance. Please try again later.