Difference between revisions of "Time Series"

From Gcube Wiki
Jump to: navigation, search
(Time Series Curation)
(Filtering)
Line 125: Line 125:
 
Clearly condition types vary depending on the column type:
 
Clearly condition types vary depending on the column type:
  
* for each column type is posible to apply condtions based on compound expressions: filtering by range;
+
* for each column type is possible to apply conditions based on compound expressions: filtering by range;
* for each column of type dimension is possible to defin a set of acceptable values: filtering by value.
+
[[Image:ts_filter_range.png|frame|none|Time Series filtering by range]]
 +
* for each column of type dimension is possible to define a set of acceptable values: filtering by value.
 +
[[Image:ts_filter_value.png|frame|none|Time Series filtering by value]]
  
 
== Union ==
 
== Union ==

Revision as of 16:52, 4 December 2009

Time Series elaboration cycle

Time Series elaboration cycle is made up of four phases:

  1. Time Series Import: a TS is imported through in the system though a CSV file.
  2. Time Series Curation: TS is corrected and cleaned.
  3. Time Series Manipulation: TS is elaborated depending on user need.
  4. Time Series Publication: TS come available for the community.
Time Series elaboration cycle

Time Series Import

Through a wizard interface a Time Series, represented as a csv file(RFC 4180) is imported through in the system.

CSV file can either be imported by uploading the file from the user file system or from the user workspace.

Once a CSV file has been loaded it is possible to select some parameters relative to the uploaded file:

  • Character Encoding
  • the file header flag
  • field separator
  • columns to import/exclude

The configuration shows a CSV sample based on the selected parameters. This sample is limited to the first 50 rows.

Time Series import wizard: CSV file configuration

Through the sample grid, it is possible to select which columns a user wants to exclude from the importing.

Time Series import wizard: column selection

In order to made the system accept the current configuration it is necessary to verify the whole file by clicking on the “Check configuration” button. The system will check then the entire CVS for RFC 4180 compliance.

If there is any error it will be possible to see which rows are wrong and, in case, to decide to skip them in the importing phase.

Time Series import wizard: errors

Once the importing phase is completed it will be possible to define metadata for the current imported CSV.

The last step consists of a CSV creation into the system. During the creation a loading bar will indicate the overall progress of the operation.

CSV Data Handling

Once the CSV has been created it will be possible to open it for examining its content. (note that the content can be ordered at this time)

A CSV can be exported directly into the Workspace, see TS CSV export section.

In order to start the curation phase it is necessary to transform the imported CSV into a TS by clicking on the button “Create Time Series”.

CSV view

Time Series Curation

The Curation phase allows to curate a TS by linking it to the reference data and consequently correcting eventual errors.

Each TS column can belong to the following types:

  • Dimension: the column is associated to the previously imported reference
  • Attribute: [TBA]
  • Value: [TBA]

In order to define a curated TS it is necessary to cure all the columns. The yet cured columns are shown by using a green filled circle in the column header. The not yet cured columns are shown by using a red filled circle in the column header.

Column Curation

In order to start curating a column, user should right click on the column header and select the item “Edit properties”. A new panel will appear for the column property editing.

Time Series Curation: edit panel

This panel allows to rename a column by editing its label.

By using the radio selection buttons it is possible to select the column type.

For the attribute types and values it is possible to select the data type: Text, Integer, Float, Date, Time, Boolean, Timestamp.

For the type Column it is possible to edit the family key and key value to associate. By using the check errors button it is possible to know how many rows contain errors associated to that dimension. By using the sample button it will be show a sample of the values with the reference data referred.

For each column it is possible to exploit the GUESS tool, which will try to guess which reference data has to be associated to the column by using the reference date present into the system. For each proposed dimension the errors number will be automatically calculated.

Time Series Curation: guess window

Once the configuration is done, it is either possible to save or discard the changes.

If the column has been associated to a dimension and, there are some rows which contain errors, the system enters in a mode called error editing.

Error editing

In this modality only the containing error rows are shown.

Time Series Curation: error edit

The row error can be due to the following reasons:

  • the value cannot find its equivalent among reference data values
  • the value finds more than a correspondence among reference data values

These two types of errors will be shown by using different colors, red for the first one, yellow for the second one.

System allow to edit the values for each single column. If the cell belong to type dimension, a popup will appear with the list of possible values to associate to it.

Time Series Curation: edit value

It is possible to discard all the changes going back to the previous column configuration any time.

Once all the rows are corrected the system will ask to the user to save the changes applied so far.

Note: It won't be possible to discard the changed made in editing mode.


Editing and Column removal

It is possible to edit single column values at any time. In order to remove a whole column from the TS, right click on the column header you want to remove and select “Remove column”.

Curation Closing

Once all column have been curated it will be possible to close the Curation publishing the TS in the Curated TS list.

Time Series Manipulation

A Time Series can be manipulate through the following operations:

  • Filtering: filter the TS by Column or Values criteria.
  • Union:Union of two Time Series.
  • Denormalization: Denormalization of TS Values
  • Grouping: grouping of values by column selection.
  • Aggregation: values aggregation by column selection.

Once the single operation is applied is possible to save the TS or discard the operation. (except for the union operation)

Filtering

It is possible to filter the TS by selecting the filtering conditions to apply to the TS Column Values.

Clearly condition types vary depending on the column type:

  • for each column type is possible to apply conditions based on compound expressions: filtering by range;
Time Series filtering by range
  • for each column of type dimension is possible to define a set of acceptable values: filtering by value.
Time Series filtering by value

Union

The Union operation merges two TS in one single TS.

The full operation is executed ny using a Wizard interface.

Fisrt step consist of selecting the TS to merge with the one currently open. Once step first is done, system checks for compatibility.


Denormalization

Denormalizes the TS based on an attribute and a value..

Grouping

[TBA]

Aggregation

[TBA]

Time Series publishing

A TS cam be published either at VO or VRE level at anytime.

Time Series Export

A TS can be exported in the user workspace using CSV format.

First step consists of the configuration of the csv to create: char-set selection, field separator, column selection and so on.

Second step asks for the basket in which save the CSV file.

Workspace integration

TS can be saved as items in the user Workspace

In order to open a previously workspace saved TS, or from another user, click on the “Load from workspace” button.