1 Search
- 1.1 Search V 2.xx
  - 1.1.1 HW requirements
  - 1.1.2 Configuration
- 1.2 Search v 3.x.x
2 Excecution Engine
3 Executor and GenericWorker
4 SmartExecutor
5 SmartGenericWorker
6 DTS
- 6.1 DTS v2.x
- 6.2 DTS v3.x
  - 6.2.1 HW requirements
  - 6.2.2 Configuration
7 Index
- 7.1 Index Service
  - 7.1.1 HW requirements
  - 7.1.2 Configuration
- 7.2 ForwardIndexNode ( Dismissed)
8 Statistical Manager
- 8.1 Resources
  - 8.1.1 Known Issues
  - 8.1.2 Additional Installation Steps
- 8.2 Services and Databases used by the Statistical Manager and Data Analysis facilities
9 GIS Technologies
10 Tabular Data Manager
- 10.1 Operation View
11 Resource Catalogue

This part of the guide is intended to cover the installation and configuration of gCube services that are not mentioned in the Administration guide. Mainly we refer to services that are not Enabling and that can be installed dynamically by the Infrastructure/VO Managers. The list includes also for each component known issues and specific configuration steps to follow.

Search

Search V 2.xx

The installation of a Search Node in gCube is characterised by the installation of 2 web-services ( in the minimal configuration ) :

SearchSystemService
ExecutionEngineService

This is the minimal installation scenario but it's possible to enable distributed search as well and this will required the installation and configuration of several ExecutionEngineServices

HW requirements

The minimal installation requirements for a Search node are a Single CPU node with 2GB RAMm but it's more than recommended to have at least 3GB RAM on the node dedicated to the GHN.

Configuration

The SearchSystemService and ExecutionEngineService have to be automatically/manually deployed in a VRE scope. In addition if we want to configure the SearchSystemService to exploit the local ExecutionEngineService to run the queries ( minimal installation) we should configure the jndi service as follows:

excludeLocal = false
collocationThreshold = 0.3f
complexPlanNumNodes = 800000

Search v 3.x.x

The 3.0 version has moved to Smartgears and tomcat.

The requirement of the codeployment with Execution Engine Service is also there , so the Execution Engine Service v 2.0.0 has been also ported to SmartGears

HW requirements

The minimal installation requirements for a Search node are a Single CPU node with 2GB RAMm but it's more than recommended to have at least 3GB RAM on the node dedicated to the GHN.

Configuration

in order to fix an issue with datanucleus compatibility and java 7 there is a change to be included in the tomcat configuration:

uncomment and modify the following line on the $CATALINA_HOME/bin/catalina.sh file:

JAVA_OPTS="$JAVA_OPTS -noverify -Dorg.apache.catalina.security.SecurityListener.UMASK=`umask`"

The conf file $CATALINA_HOME/conf/infrastructure.properties containing infra and scope informations needs to be present

# a single infrastructure
infrastructure=d4science.research-infrastructures.eu
 # multiple scopes must be separated by a common (e.g FARM,gCubeApps)
scopes=Ecosystem
clientMode=false

The conf file $CATALINA_HOME/webapps/<search>WEB-INF/classes/deploy.properties needs to be filled with this info:

hostname = xx
startScopes = xx
port=xx

Known Issues

Excecution Engine

The 2.0 version has moved to Smartgears and tomcat.

HW requirements

The minimal installation requirements for an Execution Engine node are a Single CPU node with 2GB RAMm but it's more than recommended to have at least 3GB RAM on the node dedicated to the GHN.

Installation

Different packagings of the Execution engine are available depending on the service they are going to be co-deployed with and invoked:

DTS : <artifactId>executionengineservice-dts</artifactId>
Search: <artifactId>executionengineservice-search</artifactId>

Configuration

in order to fix an issue with datanucleus compatibility and java 7 there is a change to be included in the tomcat configuration:

uncomment and modify the following line on the $CATALINA_HOME/bin/catalina.sh file:

JAVA_OPTS="$JAVA_OPTS -noverify -Dorg.apache.catalina.security.SecurityListener.UMASK=`umask`"

The conf file $CATALINA_HOME/conf/infrastructure.properties containing infra and scope informations needs to be present

# a single infrastructure
infrastructure=d4science.research-infrastructures.eu
 # multiple scopes must be separated by a common (e.g FARM,gCubeApps)
scopes=Ecosystem
clientMode=false

The conf file $CATALINA_HOME/webapps/<execution-engine>WEB-INF/classes/deploy.properties needs to be filled with this info:

hostname = xx
startScopes = xx
port=xx
pe2ng.port = 4000

in case the exeucution engine needs to call DTS on the container.xml add:

<property name='dts.execution' value='true' />

Executor and GenericWorker

HW requirements

The minimal installation requirements for an Executor node with a Generic Worker plugin are a Single CPU node with 2GB RAM but it's more than recommended to have at least 3GB RAM on the node dedicated to the GHN.

Configuration

The following Software should be installed on the VM:

R version 2.14.1

whit the following components

coda
R2jags
R2WinBUGS
rjags
bayesmix
runjags

Known Issues

The GenericWorker is exploited by the Statistical Manager service to run distributed computations. Given that the SM use the root scope to discover instances of the GenericWorker. the plugin must be deployed at root scope level

Given that the GenericWorker plugin depends on the Executor Service, when dynamically deploying the plugin the Executor Service is also deployed.

SmartExecutor

HW requirements

The minimal installation requirements for an Executor node with a Generic Worker plugin are a Single CPU node with 2GB RAM but it's more than recommended to have at least 3GB RAM on the node dedicated to the vHN (Smartgears gHN).

Configuration

No specific configuration are needed for SmartExecutor

Known Issues

When correctly started the SmartExecutor publishes a ServiceEndpoint with <Category>VREManagement</Category> and <Name>SmartExecutor</Name>. You can check the availability of the plugin on that resource. there is one <AccessPoint> per plugin.

SmartGenericWorker

HW requirements

The minimal installation requirements for an Executor node with a Generic Worker plugin are a Single CPU node with 2GB RAM but it's more than recommended to have at least 3GB RAM on the node dedicated to the vHN.

Configuration

The following Software should be installed on the VM:

R version 2.14.1

whit the following components

coda
R2jags
R2WinBUGS
rjags
bayesmix
runjags

Known Issues

The SmartGenericWorker is exploited by the Statistical Manager service to run distributed computations. Given that the SM use the root scope to discover instances of the SmartGenericWorker, the plugin must be deployed at root scope level
To deploy SmartGenericWorker you need to copy the SmartGenericWorker jar-with-dependecies in $CATALINA_HOME/webapps/smart-executor/WEB-INF/lib/ directory. A container restart is needed to load the new plugin.
When the container is restarted the plugin availability can be cheeked looking at the Service Endpoint published by the SmartExecutor.

This simple script can help the deployment process.

#!/bin/bash $CATALINA_HOME/bin/shutdown.sh -force rm -rf $CATALINA_HOME/webapps/smart-executor*

cp ~/smart-executor.war $CATALINA_HOME/webapps/

mkdir $CATALINA_HOME/webapps/smart-executor unzip $CATALINA_HOME/webapps/smart-executor.war -d $CATALINA_HOME/webapps/smart-executor

cp ~/smart-generic-worker-*.jar $CATALINA_HOME/webapps/smart-executor/WEB-INF/lib/

sleep 5s $CATALINA_HOME/bin/startup.sh

DTS

DTS v2.x

HW requirements

The minimal installation requirements for an DTS node are a Single CPU node with 2GB RAMm but it's more than recommended to have at least 3GB RAM on the node dedicated to the GHN.

Configuration

DTS uses Execution Engine to run the transformations so at least one Execution Engine should be deployed in the same scope as DTS and the related GHNLabels.xml file should contain:

<Variable>
      <Key>dts.execution</Key>
      <Value>true</Value>
</Variable>

Known Issues

none

DTS v3.x

HW requirements

The minimal installation requirements for an DTS node with a Generic Worker plugin are a Single CPU node with 2GB RAMm but it's more than recommended to have at least 3GB RAM on the node dedicated to the GHN.

Configuration

The conf file $CATALINA_HOME/conf/infrastructure.properties containing infra and scope informations needs to be present

# a single infrastructure
infrastructure=d4science.research-infrastructures.eu
 # multiple scopes must be separated by a common (e.g FARM,gCubeApps)
scopes=Ecosystem
clientMode=false

The conf file $CATALINA_HOME/webapps/<dts>/WEB-INF/classes/deploy.properties needs to be filled with this info:

hostname = xx
startScopes = xx
port=xx

DTS uses Execution Engine to run the transformations so at least one Execution Engine should be deployed in the same scope as DTS and the related Smartgears conf file ( container.xml ) should have this properties:

<property name='dts.execution' value='true' />

Index

Index Service

The Index Service is the latest released Restful Service running on Smartgears. It implements both FW and FT index functionalitoes

HW requirements

Given codeployment with ElasticSearch ( embedded) it's recommended at least a VM with 4GB RAM and 2 CPUs.

Also open file limit should be raised to 32000

Configuration

Details on the Index Service configuration are available at https://gcube.wiki.gcube-system.org/gcube/index.php/Index_Management_Framework#Deployment_Instructions

ForwardIndexNode ( Dismissed)

The ForwardIndexNode service needs to be codeployed with an instance of CouchBase service

HW requirements

Given codeployment with Couchbase it's recommended at least a VM with 4GB RAM and 2 CPUs.

Configuration

The installation of Couchbase should be performed manually and it depends on the OS ( binary package, rpm, debs).

It's recommended to put an higher limit of the open files on the VM ( 32000 min).

The configuration for the FWIndexNode that should be customized (jndi file):

couchBaseIP = IP of the server hosting Couchbase ( so the same as the GHN)
couchBaseUseName = the username set when configuring Couchbase
couchBasePassword = the password set when configuring Couchbase

Once configured it's needed to initialize Couchbase using the cb_initialize_node.sh script contained into the service configuration folder.

Known Issues

Sometimes the cb_initialize_node.sh script fails, it could mean that there is not enough memory to inizialize the data bucket , try to reduce the value of ramQuota in the jndi file.

Statistical Manager

Resources

Runtime Resources	'	'
DataStorage/StorageManager	VO/VRE	StorageManager
Database/Obis2Repository	VRE	Trendylyzer
Database/StatisticalManagerDatabase	INFRA/VO/VRE	Statistical
Database/AquamapsDB	VO/VRE	Algorithms
Database/FishCodesConversion	VO/VRE	Algorithms
Database/FishBase	VO/VRE	Algorithms - TaxaMatch
DataStorage/Storage Manager	INFRA/VO/VRE	All
Gis/Geoserver1..n	VRE	Maps Algorithms
Gis/TimeSeriesDatastore	VO/VRE	Maps Algorithms
Gis/GeoNetwork	VRE	Maps Algorithms
Service/MessageBroker	VO	Service
BiodiversityRepository/CatalogofLife	VO/VRE	Occurrence Algorithms
BiodiversityRepository/GBIF	VO/VRE	Occurrence Algorithms
BiodiversityRepository/ITIS	VO/VRE	Occurrence Algorithms
BiodiversityRepository/WoRDSS	VO/VRE	Occurrence Algorithms
BiodiversityRepository/WoRMS	VO/VRE	Occurrence Algorithms
BiodiversityRepository/OBIS	VO/VRE	Occurrence Algorithms
BiodiversityRepository/NCBI	VO/VRE	Occurrence Algorithms
BiodiversityRepository/SpeciesLink	VO/VRE	Occurrence Algorithms
DataAnalysis/Dataminer	VRE	Required if Dataminer is needed in the VRE

WS Resources	'	'
Workers	INFRA/VO	Parallel Computations

Generic Resources	'	'
ISO/MetadataConstants	VO/VRE	Maps Algorithms

Known Issues

Tested on ghn 4.0.0 and StatisticalManager service 1.4.0:

install the SM on the same network where the database and the used resources are located. Otherwise it would imply to restart production databases because direct access could not be granted to such resources.
remove lib axis-1.4.jar from gCore/lib
replace the library hsqldb-1.8.jar with the library hsqldb-2.2.8.jar in gCore/lib

Additional Installation Steps

create a suitable R environment[1]
download the file following file gebco under /home/gcube/gCore/etc/statistical-manager-service-full-XXX/cfg and rename it as gebco_08.nc
copy the gcube keys under /home/gcube/gCore/etc/statistical-manager-service-full-XXX/cfg/PARALLEL_PROCESSING

Services and Databases used by the Statistical Manager and Data Analysis facilities

GHN

gcube@statistical-manager1.d4science.org

gcube@statistical-manager2.d4science.org

gcube@statistical-manager3.d4science.org

gcube@statistical-manager4.d4science.org

gcube2@statistical-manager.d.d4science.org

TOMCAT

(root user)

thredds.research-infrastructures.eu

wps.statistical.d4science.org

rstudio.p.d4science.research-infrastructures.eu

geoserver.d4science.org

geoserver2.d4science.org

geoserver3.d4science.org

geoserver4.d4science.org

geoserver-dev.d4science-ii.research-infrastructures.eu

geoserver-dev2.d4science-ii.research-infrastructures.eu

geonetwork.geothermaldata.d4science.org

geonetwork.d4science.org

THIRD PARTY SERVICES

(root user)

rstudio.p.d4science.research-infrastructures.eu (sw rstudio, command: rstudio-server restart)

DATABASES

(root user)

geoserver-db.d4science.org

node49.p.d4science.research-infrastructures.eu

biodiversity.db.i-marine.research-infrastructures.eu

db1.p.d4science.research-infrastructures.eu

db5.p.d4science.research-infrastructures.eu

dbtest.research-infrastructures.eu

dbtest3.research-infrastructures.eu

geoserver.d4science-ii.research-infrastructures.eu

geoserver2.i-marine.research-infrastructures.eu

geoserver-db.d4science.org

geoserver-test.d4science-ii.research-infrastructures.eu

node50.p.d4science.research-infrastructures.eu

node49.p.d4science.research-infrastructures.eu

node59.p.d4science.research-infrastructures.eu

obis2.i-marine.research-infrastructures.eu

statistical-manager.d.d4science.org

WORKER NODES

(gcube2 user)

(production)

node3.d4science.org

node4.d4science.org

node11.d4science.org

node12.d4science.org

node13.d4science.org

node14.d4science.org

node15.d4science.org

node16.d4science.org

node18.d4science.org

node20.d4science.org

node21.d4science.org

node23.d4science.org

node27.d4science.org

node28.d4science.org

node29.d4science.org

node30.d4science.org

node31.d4science.org

node32.d4science.org

node33.d4science.org

node34.d4science.org

node35.d4science.org

node36.d4science.org

node37.d4science.org

node38.d4science.org

node39.d4science.org

(development)

node17.d4science.org

node19.d4science.org

node22.d4science.org

TESTING

Test plan for the Statistical Manager.

GIS Technologies

In order to handle GIS Technologies, developers should rely on libraries geonetwork and gisinterface. Both distributed under subsystem org.gcube.spatial.data. Depending on which libraries are used, different resources are mandatory.

Geonetwork

This sections covers the default behavior of geonetwork library. Please note that clients of the library might override it.

Geonetwork Service Discovery

A single Service Endpoint per Geonetwork instance is needed, you can find more details on the resource here.

Metadata Publication

In order to exploit the library's features to generate ISO metadata, the following Generic Resource is needed in the scope :

Secondary Type : ISO
Name : MetadataConstants

Metadata Resolution

Geonetwork library uses the "Uri Resolver Manager" library to resolve the Gis Layer generated via HTTP protocol, the following Generic Resource is needed in the scope:

Uri Resolver Manager

https://gcube.wiki.gcube-system.org/gcube/URI_Resolver#Uri_Resolver_Manager

<Type>GenericResource</Type>
<SecondaryType>UriResolverMap</SecondaryType>
<Name>Uri-Resolver-Map</Name>

GeoServer

In order to let gisinterface library discover instances of Geoserver, an Access Point must be defined for each instance. The Service Endpoint resource for such Access Points must have :

Category : Gis
Platform/Name : GeoServer

GeoExplorer

In order to let GeoExplorer portlet work fine, you must copy the resources following from root scope (/d4science.research-infrastructures.eu/) to the VRE where it must run:

Transect

<Type>RuntimeResource</Type>
<Caegory>Application</Category>
<Name>Transect</Name>

Gis Resolver

https://gcube.wiki.gcube-system.org/gcube/URI_Resolver#GIS_Resolver

<Type>RuntimeResource</Type>
<Category>Service</Category>
<Name>Gis-Resolver</Name>

Gis Viewer Application

<Type>GenericResource</Type>
<SecondaryType>ApplicationProfile</SecondaryType>
<Name>Gis Viewer Application</Name>

and then must edit the Generic Reosurce shown here: https://gcube.wiki.gcube-system.org/gcube/URI_Resolver#Generic_Resource_for_Gis_Viewer_Application

Tabular Data Manager

Each service's operation may need a specific configuration. The following is a list of needed resources per operation module.

Operation View

The module requires GIS Technologies to be already configured in the operating scope. See Gis Technologies.

The module requires also the following Generic Resource :

Secondary Type : TDMConfiguration

Since the operation needs to put data in a postgis database already connected with Geoserver, a Service Endpoint for such database must be present in the same scope. Constraints for retrieving such Service Endpoint are taken from the Generic Resource described above (values are indicated with their xml Element name as declared in the Generic Resource's body) :

Category : <gisDBCategory>
Platform/Name : <gisDBPlatformName>
AccessPoint/<tdmDataStoreFlag> : true

Resource Catalogue

In this section the resources required to deploy the Catalogue in a given context are reported.

Please note that only the mandatory ones are shown.

CKAN Connector

ServiceClass = DataAccess
ServiceName = CkanConnector

This is the service that allows to perform login operation from the Gateways on CKAN. It runs on SmartGears so once it is published in the context there is no much left to do. However, it is fundamental.

Generic Resource

Portlet URL

SecondaryType = ApplicationProfile
Name = CkanPortlet
Description = The url of the gcube-ckan-datacatalog portlet for this scope

The content (body) of the resource has to report the url of the catalogue portlet for this context, e.g.

<url>https://services.research-infrastructures.eu/group/d4science-services-gateway/data-catalogue</url>

Service Endpoint(s)

CKanDataCatalogue

Application = Application
Name = CKanDataCatalogue
Description = A Tomcat Server hosting the ckan data catalogue

Among the other properties of the SE, these should be reported:

HostedOn (in RunTime) is the url of the ckan instance, e.g. ckan-d4s.d4science.org;
Username (in AccessData) is the username of the CKAN SYSAdmin;
Property URL_RESOLVER, whose value is equal to the url of the URI-RESOLVER in the context;
Encrypted property API_KEY, is the api key of the CKAN SYSAdmin.

CKanDatabase

Application = Database
Name = CKanDatabase
Description = A Tomcat Server hosting the ckan data catalogue

Among the other properties of the SE, these should be reported:

HostedOn (in RunTime) is the machine hosting the postgres CKAN uses (e.g. ckan-pg-d4s.d4science.org);
EndPoint (in AccessPoint) is the machine url hosting the postgres CKAN uses followed by the port number (e.g., ckan-pg-d4s.d4science.org:5432);
In AccessData please report the credentials (password must be encrypted) of the user allowed to access the database.

ServiceManager Guide

Contents

Search

Search V 2.xx

HW requirements

Configuration

Search v 3.x.x

HW requirements

Configuration

Known Issues

Excecution Engine

HW requirements

Installation

Configuration

Executor and GenericWorker

HW requirements

Configuration

Known Issues

SmartExecutor

HW requirements

Configuration

Known Issues

SmartGenericWorker

HW requirements

Configuration

Known Issues

DTS

DTS v2.x

HW requirements

Configuration

Known Issues

DTS v3.x

HW requirements

Configuration

Index

Index Service

HW requirements

Configuration

ForwardIndexNode ( Dismissed)

HW requirements

Configuration

Known Issues

Statistical Manager

Resources

Known Issues

Additional Installation Steps

Services and Databases used by the Statistical Manager and Data Analysis facilities

GHN

TOMCAT

THIRD PARTY SERVICES

DATABASES

WORKER NODES

TESTING

GIS Technologies

Geonetwork

Geonetwork Service Discovery

Metadata Publication

Metadata Resolution

GeoServer

GeoExplorer

Tabular Data Manager

Operation View

Resource Catalogue

CKAN Connector

Generic Resource

Portlet URL

Service Endpoint(s)

CKanDataCatalogue

CKanDatabase

Navigation menu

Search