OscarImporterSEPlugin

From Gcube Wiki
Jump to: navigation, search

The Oscar-Importer-SE-Plugin is a Smart Executor plugin which periodically (currently every week) builds a merged version of the OSCAR [1] dataset and uploads it to the infrastructures Thredds server.

Following user requirements, the importer is not using the original dataset, but a version [2] where coordinates have been rearranged into -180 to 180 instead of 20 to 420 like the other OSCAR datasets


Deployment

Host requirements

  • The plugin has to be deployed on a Smart Executor node with at least 25GB of free disk space in /tmp. The disk space is required for files to be merged and for the merged file before uploading.
  • The following packages need to be installed on the host:
 libnetcdfc7

On the smart executor node, copy the jar-with-dependencies in /home/gcube/tomcat/webapps/smart-executor/WEB-INF/lib

Requirements on the machine hosting the Thredds Server

The machine hosting the Thredds Server must be configured to allow upload of large files. As of 26 September 2017, the size of the uploaded file is 10.0GB and its size grows by about 400MB/year.

In particular, if it's running nginx, the property client_max_body_size should be set; a safe value could be 12GB.

Managing the service

The Smart executor needs to be restarted in order to discover and publish the plugin in the Information System.

Stop the container:

 ./stopContainer.sh  

Start the container:

 ./startContainer.sh

Check

  • check there are no exception in:
  SmartGears/ghn.log
  tomcat/logs/catalina.out
  tomcat/logs/localhost.log
  • check that the profile are published in the IS; look at the infrastructure monitor (Service Endpoint -> VREManagement)

Starting the job

Executor parameters

 TODO

One shot

 TODO

Scheduled

 TODO

Execution times

When starting from scratch (i.e. the first time a merge is performed) the whole process, from download to publication, takes about 4 hours. Of course this might depend on the network and CPU speed of the machines.

Further executions of the merge take advantage of files already downloaded and merged, and can complete in less than one hour.

Publication of results

In the production infrastructure, the merged file is published at http://thredds.d4science.org/thredds/catalog/public/netcdf/Oscar/catalog.html

Under the hood

  • To save disk space, the importer downloads small groups (currently 3) of yearly oscar files, merges them and proceeds iteratively.
  • The merger uses /tmp/oscar-merger as working directory. All needed files (downloaded, merged, descriptors, temporary) are placed here. No other directory is used on the machine.
  • Upon successful upload to Thredds, the work directory is cleaned with the exception of the last stable merged file (e.g. oscar-1992-2016.nc); this will speed-up next merge since only diff files (e.g. current year) will be downloaded and merged to it.

References

  1. https://podaac.jpl.nasa.gov/dataset/OSCAR_L4_OC_third-deg
  2. ftp://podaac-ftp.jpl.nasa.gov/allData/oscar/preview/L4/resource/LAS/oscar_third_deg_180/