“Bring your science to the web and the web to your science”

Overview and objectives

The e-Research Summer Hackfest was held at the Department of Physics and Astronomy of the University of Catania (Italy) in two editions: the first on July, 4-15, 2016 and the second on July, 18-29, 2016 (the second edition was held to allow the participation of selected candidates from Africa who could not come to the first edition due to the time required for visa issuance in their country). In total, 10 instructors and 26 selected candidates from 9 countries (4 African and 5 European) attended the two editions of the hackfest.
Both editions were co-sponsored by the Sci-GaIA, INDIGO-DataCloud, and COST ENeL projects in terms of participants, use cases and tools.
The main objective of the hackfest was to integrate scientific use cases through a pervasive adoption of web technologies and standards and make them available to their end users through Science Gateways(1) (entities connected to distributed computing, data and services of interest to the Community of Practice the end users belong to). Promoting and fostering open and reproducible research was the ultimate goal of the hackfest.

Topics

The following topics were tackled during the e-Research Summer Hackfest:

  • Big Data analytics
  • Distributed computing services
  • Distributed storage services
  • Programmatic access to Open Data repositories
  • Semantic federation of Open Access repositories
  • User interfaces (web, desktop, mobile, etc.)
  • Workflows

Tools and technologies

The following tools and technologies were showcased at the e-Research Summer Hackfest and used to implement the proposed use cases:

AGENDA OF 1ST EDITION AGENDA OF 2ND EDITION

VIDEO LECTURES & TUTORIALS STREAMED VIDEO RECORDINGS VIDEO “MOMENTS” DOCUMENTARY VIDEO

Resources

Roberto Barbera
UNIVERSITY OF CATANIA - ITALY
- The Sci-GaIA project and introduction to the hackfest: Slides - Video
- Programmatic interaction with Open Access Repositories: Slides - Video part 1, Video part 3
web site
Giacinto Donvito
INFN - Italy
The INDIGO-DataCloud project: Slides - Video
Riccardo Bruno
INFN - Italy
The FutureGateway framework - Overview : Slides - Video part 1 , Video part 2
The FutureGateway framework - APIs: Slides - Video part 1, Video part 2
The FutureGateway framework - Installation: Slides - Video part 1, Video part 2, Video part 3
Marica Antonacci
INFN - Italy
Antonio Calanducci
INFN - Italy
The gLibrary framework: Slides - Video, Video live demo
Carla Carrubba
University of Catania - Italy
Programmatic interaction with Open Access Repositories: Slides - Video part 2 - XML exemplar
Krzysztof Trzepla
CYFRONET - Poland
The Onedata platform: Slides - Video part 2, Video part 3
Konrad Zemek
CYFRONET - Poland
Onedata Platform: Slides - Video part 1, Video part 4
Alessandro D'Anca
CMCC - Italy
The Ophidia platform: Slides - Video part 1
The Ophidia platform - Tutorial: Slides - Video part 2, Video part 3
Michal Owsiak
PSNC - Poland
The Kepler workflow manager: Slides - Video, Video live demo - Tutorial web page

Error Correction of NGS Data
indigo-logo-62px

The error correction of the NGS data is normally the first step of any application targeting NGS. Many projects in different real-life applications have opted for this step before further analysis. MuffinEC is a multi-technology (Illumina, Roche 454, Ion Torrent and PacBio – experimental), any-type-of-error handling (mismatches, deletions insertions and unknown values) corrector. It surpasses other similar software by providing higher accuracy (demonstrated by four types of tests) and using less computational resources. It follows a multi-steps approach that starts by grouping all the reads using a k-mers based metric. Next, it employs the powerful Smith-Waterman algorithm to refine the groups and generate Multiple Sequence Alignments (MSAs). These MSAs are corrected by taking each column and looking for the correct base, determined by a user-adjustable percentage. We plan to use Ophidia and Onedata to prepare our software for the cloud.
Presentation Andy S. Alic, Universitat Politecnica de Valencia – Spain


Algae Bloom Case Study: Managing Data From Models
indigo-logo-62px

The Hydrodynamic and Water Quality modeling requires a number of parameters that are strongly correlated. Due to that number and the space and temporal needs of high resolution models the input and output files are pretty big. Delft3D software suite is the tool used to perform the modeling, and includes the simulation of the physical, chemical and biological parameters of a Water Reservoir in Soria, Spain. This case study aims to perform the modeling of the reservoir automatically under a cloud framework. In the context of the Hackfest, three different tools could be used:
-OneData: We need a distributed storage solution to share a common space for input (accessible by computing) and output generated by the model (accessible by users).
-Ophidia: Big Data tools are very interesting to analyze the big amount of parameters available in the output.
-Kepler: a workflow to automatically analyze the results could be very useful.
Presentation Fernando Aguilar, IFCA – Spain


Distributed Archive System for the Cherenkov Telescope
indigo-logo-62px

The Cherenkov Telescope Array (CTA) project aims to build a large array of Cherenkov telescopes of different sizes and deployed on an unprecedented scale. It will allow a significant extension of our current knowledge in high-energy astrophysics. The CTA data and their scientific products need to be preserved in a dedicated archive guaranteed to provide open access to a wide and diverse scientific community. Handling and archiving the large amount of data generated by the instruments and delivering scientific products according to astrophysical standards is one of the challenges in designing the CTA observatory. We present our plan to implement a distributed archive system federating storages using the OneData platform (and/or other promising INDIGO-DataCloud technologies).

Presentation Eva Sciacca, INAF, Astrophysical Observatory of Catania – Italy


Astronomical data format integration into Ophidia
indigo-logo-62px

FITS format is the standard data format for archiving images in astronomy. By means of this use case we aim at integrating the FITS format into Ophidia framework opening the path to the analysis of astronomical data within this powerful tool.
Presentation Elisa Londero, INAF, Astronomic Observatory of Trieste – Italy


Collaborative Knowledge Discovery Environment on Biodiversity and Linguistic Diversity

logo-cost-enel

The project aim is to establish a collaborative / team science workflow and enableknowledge discovery as well as experimental scholarship in biodiversity and linguistic diversity.Our first step towards this is to establish a working environment (workspace) for researchers to explore linguistic diversity and interconnection of languages and cultural artefacts / data in linguistic and biological domain.We aim to provide users of different domains and with several backgrounds (researchers of different disciplines, layman) with services/applications for workflows to discover, curate and interlink biological taxonomic data with linguistic/ terminological and cultural data, enrich and connect their data to external resources and publish them freely accessable on the web as open data.The project is connected to ongoing initiatves like the COST ENeL action (european network of electornic lexicography).
Presentation Eveline Wandl-Vogt, Ksenya Zaytseva, Davor Ostojic, OEAW-ACDH – Austria


Reproducible Automatic Speech Recognition workflows
logo_sci-gaia

The use-case proposed is specific for the rich community of Human Language Technologies users in South Africa. A template for Automatic Speech Recognition will be built into a web interface and the data it uses will be stored on Open Access Repository, the application is accessed via a Science Gateway. The user specifies their parameters and data on the web interface and submits the job to the Science Gateway which takes
care of the rest. gLibrary may be used to store some of the statistical results from the experiment.
PresentationFinal report David Risinamhodzi, Northwest University – South Africa


Implementation of eCulture Science Gateway – reloaded
indigo-logo-62px

The presentation regards the digital library “MuseiD-Italia”, which showcases images and metadata regarding Cultural Heritage in Italy. ICCU is trying to revamp the whole workflow in order to make it better, easier and faster, as well as adding new potential features, potentially looking at the integration of INDIGO solution. This is also intended to be a use case in order to expand experimentations in the next months to other (and bigger) ICCU-run or ICCU-led projects.
Presentation Luca Martinelli, ICCU – Italy


Intelligent Medical Image Analyzer
logo_sci-gaia

The proposed system is an e-infrastructure for processing medical images so that the processed data could be used for decision support during diagnosis or clinical research. The two categories of people who will most likely use our tools are clinicians and researchers conducting medical research. The frontend of the proposed system will be built using PHP, HTML, javascript and xamp. The frontend will be user friendly, interactive and will allow uploading of medical images. It will also contain options for image processing and report generation. The backend will contain MATLAB, C++ compiler and some specialized medical image processing software packages and some test data. The proposed system will also have some storage allowance for image uploads by the user. Once images are uploaded, the software relevant for processing the particular uploaded images will be selected automatically and applied on the images. The image storage model will allow some specific types of medical images that are commonly used in medical field.
PresentationFinal report Benjamin Aribisala, Lagos State University – Nigeria


WEKA Machine Learning in Breast Cancer
logo_sci-gaia

The Wisconsin Breast Cancer datasets from the UCI Machine Learning Repository is used as a use case to classify benign and malignant samples using WEKA. The main task is to create a web interface to interact and use classification features of WEKA.
PresentationFinal report Stephan Mgaya, TERNET – Tanzania


Technology Transfer Alliance Collaboration Platform
logo_sci-gaia

The TTA Collaboration Platform is intended to be a web-based platform containing an integrated set of tools, applications, data repositories that are accessed via a portal: the TTA Portal. The motivation of developing this platform is to support collaboration and training and to foster education among the partners, sharing of all sorts of resources and dissemination of results. The platform will allow each partner to submit content such as project proposals, project documents, news update, information sharing via content lists and other kinds of content such as video or other multi-media contents that cover in a secure manner.
PresentationFinal report Diana Rwegasira, University of Dar es Salaam – Tanzania


iGrid – Smart Grid Capacity Development and Enhancement in Tanzania
logo_sci-gaia

Designing, implementing, demonstrating, testing and validating an autonomous solar-powered LVDC nanogrid prototype, serving an off-grid community of 10-100 households that can also be integrated in a higher voltage AC/DC grid if needed, as part of of a bigger strategy to ensure access to reliable and affordable electrical power supply to all communities (especially rural).
PresentationFinal report Aron Kondoro, University of Dar es Salaam – Tanzania


WIMEA–ICT: Science Gateway for Weather Information Management in East Africa to interact with ICT Tool WRF
logo_sci-gaia

Accessing and interacting with different applications/tools running on remote High Performance Computing (HPC) facilities is a challenge to most of researchers, scientists and students particularly in East Africa when it come that there is no graphical user interface (GUI). Many users are not familiar with Linux environment commands instead they demand to use GUI over windows machine in which again most scientific applications cant be deployed like in Linux machine. Weather Research and Forecasting (WRF) is a selected tool in the project improving Weather Information Management in East Africa through ICT tools. This use case is targeting to implement the Science Gateway (web portal) for easiness interacting remotely with WRF tool on HPC. The development will be on integrated open source tools: Future Gateway (FG) and Application Programming Interfaces (APIs).
PresentationFinal report Damas Makweba, Dar es Salaam Institute of Technology – Tanzania


Public Health Gateway in Kenya
logo_sci-gaia

A gateway for public health experts to publish content addressing issues to do with prevalence of fatalities arising from motorcycle accidents in Kenya is urgently needed. Whereas these fatalities are many, and frankly, avoidable, the persistence of this problem is worrying, and is a drain to the economy, and a real problem to families.
There are other severe concerns such as data analysis on immunisation and the effects of not taking up immunisation of children. All these societal concerns may have a wealth of information that can be disseminated for public consumption, with the potential to change things for the better, yet this is far from achieved in Kenya. The gateway will use a virtual storage model where the hypervisor provides an emulated hardware for each hardware environment for each virtual machine, including computer, memory and storage.
PresentationFinal report Dennis Muoki Kimego and Charles Muiruri Njaramba, Egerton University – Kenya

AGUILAR, Fernando – IFCA, Spain
ALIC, Andrei S.
Universitat Politecnica de Valencia – Spain

ANAGNOSTOU, Anastasia
Brunel University London – UK
ANTONACCI, Marica
INFN – Italy

ARIBISALA, Benjamin
Lagos State University – Nigeria

BARATTA, Daniele
Software Engineering Italia – Italy

BARBERA, Roberto
University of Catania – Italy

BECKER, Bruce
CSIR Meraka – South Africa
BRUNO, Riccardo
INFN – Italy

CALANDUCCI, Antonio
INFN – Italy


CARRUBBA, Carla
University of Catania – Italy

CAVALLARO, Alfio
Software Engineering
Italia – Italy

D’ANCA, Alessandro
CMCC – Italy


DONVITO, Giacinto INFN – Italy

FARGETTA, Marco
INFN – Italy

KIMEGO, Dennis Muoki
Egerton University – Kenya

KONDORO, Aron
University of Dar es Salaam – Tanzania

LONDERO, Elisa
INAF, Astronomic Observatory of Trieste – Italy

MARCUCCI, Nicola M.
INGV – Italy

MAKWEBA, Damas
TERNET – Tanzania

MARCO DE LUCAS, Jesus
IFCA – Spain

MARTINELLI, Luca
ICCU – Italy


MGAYA, Stephan N.
TERNET – Tanzania

NJARAMBA, Charles Muiruri
Egerton University – Kenya

OSTOJIC, Davor
OEAW-ACDH – Austria

OWSIAK, Michal
PSNC – Poland


RICCERI, Rita
University of Catania – Italy

RISINAMHODZI, David
Northwest University – South Africa

RWEGASIRA, Diana
University of Dar es Salaam – Tanzania


SCIACCA, Eva
INAF, Astrophysical Observatory of Catania, Italy

TAYLOR, Simon J. E.
Brunel University London – UK

TORRISI, Mario
University of Catania – Italy

TRZEPLA, Krzysztof
CYFRONET – Poland

WANDL-VOGT, Eveline
OEAW-ACDH – Austria


ZAYTSEVA, Ksenia
OEAW-ACDH – Austria

ZEMEK, Konrad
CYFRONET – Poland

Contacts

In you have any enquiries regarding the e-Research Summer Hackfest, feel free to contact us at summer-school@sci-gaia.eu

 


1A Science Gateway is usually a tool which enables the members of Virtual Research Communities to access relevant applications and tools deployed on geographically distributed e-Infrastructures. For a definition, see the XSEDE web pages.