Scalable Data Mining Virtual Research Environment

The Scalable Data Mining VRE is designed to apply Data Mining techniques to biological data. The algorithms are executed in a distributed fashion on the e-Infrastructure nodes or on local multi-core machines. Scalability is thus meant as distributed data processing but also as services dynamically provided to the users. The system is scalable in the number of users and in the size of the data to process. Statistical data processing can be applied to perform generic Data Mining, Ecological Niche Modelling or Ecological Modelling experiments. Other applications can use general purpose techniques like Bayesian models. Time series of observations can be managed as well, in order to classify trends, catch anomaly patterns and perform simulations. The idea behind the distributed computation for data mining techniques is to overcome common limitations that can occur when using statistical algorithms, like training and projection procedure time, the linear or non-linear time increase when the number of data to process increases, multiple runs needed for reducing overfitting or local minima problems, or multiple models topologies to be evaluated for assessing the optimal model's configuration. All the above issues strongly limit the amount of time a scientist can dedicate to the evaluation of the results and to the combination and comparison of the outcomes of different experiments. Using a distributed e-Infrastructure endowed with collaborative approach may overcome these issues.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Additional Info
Field Value
About VRE Expiration The VRE expiration date can either be extended or anticipated upon VRE Manager's request. Any of these requests will take one month to be carried out.
Access Policy Public
Acknowledgment Statement This work has received support from the BlueBRIDGE Project (European Union’s Horizon 2020 research and innovation programme, Grant agreement No 675680) and is operated by the D4Science Infrastructure (D4Science.org).
Public page https://bluebridge.d4science.org/web/scalabledatamining
VRE Creation Date November 13, 2016
VRE Expiration Date January 31, 2021
system:type VirtualResearchEnvironment
Management Info
Field Value
Author Perciante Costantino
Maintainer Perciante Costantino
Version 1
Last Updated 27 March 2018, 11:41 (CEST)
Created 27 March 2018, 11:41 (CEST)