“Bring your science to the web and the web to your science”
Overview and objectives
The e-Research Summer Hackfest was held at the Department of Physics and Astronomy of the University of Catania (Italy) in two editions: the first on July, 4-15, 2016 and the second on July, 18-29, 2016 (the second edition was held to allow the participation of selected candidates from Africa who could not come to the first edition due to the time required for visa issuance in their country). In total, 10 instructors and 26 selected candidates from 9 countries (4 African and 5 European) attended the two editions of the hackfest.
Both editions were co-sponsored by the Sci-GaIA, INDIGO-DataCloud, and COST ENeL projects in terms of participants, use cases and tools.
The main objective of the hackfest was to integrate scientific use cases through a pervasive adoption of web technologies and standards and make them available to their end users through Science Gateways(1) (entities connected to distributed computing, data and services of interest to the Community of Practice the end users belong to). Promoting and fostering open and reproducible research was the ultimate goal of the hackfest.
The following topics were tackled during the e-Research Summer Hackfest:
- Big Data analytics
- Distributed computing services
- Distributed storage services
- Programmatic access to Open Data repositories
- Semantic federation of Open Access repositories
- User interfaces (web, desktop, mobile, etc.)
Error Correction of NGS Data
The error correction of the NGS data is normally the first step of any application targeting NGS. Many projects in different real-life applications have opted for this step before further analysis. MuffinEC is a multi-technology (Illumina, Roche 454, Ion Torrent and PacBio – experimental), any-type-of-error handling (mismatches, deletions insertions and unknown values) corrector. It surpasses other similar software by providing higher accuracy (demonstrated by four types of tests) and using less computational resources. It follows a multi-steps approach that starts by grouping all the reads using a k-mers based metric. Next, it employs the powerful Smith-Waterman algorithm to refine the groups and generate Multiple Sequence Alignments (MSAs). These MSAs are corrected by taking each column and looking for the correct base, determined by a user-adjustable percentage. We plan to use Ophidia and Onedata to prepare our software for the cloud.
Presentation Andy S. Alic, Universitat Politecnica de Valencia – Spain
Algae Bloom Case Study: Managing Data From Models
The Hydrodynamic and Water Quality modeling requires a number of parameters that are strongly correlated. Due to that number and the space and temporal needs of high resolution models the input and output files are pretty big. Delft3D software suite is the tool used to perform the modeling, and includes the simulation of the physical, chemical and biological parameters of a Water Reservoir in Soria, Spain. This case study aims to perform the modeling of the reservoir automatically under a cloud framework. In the context of the Hackfest, three different tools could be used:
-OneData: We need a distributed storage solution to share a common space for input (accessible by computing) and output generated by the model (accessible by users).
-Ophidia: Big Data tools are very interesting to analyze the big amount of parameters available in the output.
-Kepler: a workflow to automatically analyze the results could be very useful.
Presentation Fernando Aguilar, IFCA – Spain
Distributed Archive System for the Cherenkov Telescope
The Cherenkov Telescope Array (CTA) project aims to build a large array of Cherenkov telescopes of different sizes and deployed on an unprecedented scale. It will allow a significant extension of our current knowledge in high-energy astrophysics. The CTA data and their scientific products need to be preserved in a dedicated archive guaranteed to provide open access to a wide and diverse scientific community. Handling and archiving the large amount of data generated by the instruments and delivering scientific products according to astrophysical standards is one of the challenges in designing the CTA observatory. We present our plan to implement a distributed archive system federating storages using the OneData platform (and/or other promising INDIGO-DataCloud technologies).
Presentation Eva Sciacca, INAF, Astrophysical Observatory of Catania – Italy
Astronomical data format integration into Ophidia
FITS format is the standard data format for archiving images in astronomy. By means of this use case we aim at integrating the FITS format into Ophidia framework opening the path to the analysis of astronomical data within this powerful tool.
Presentation Elisa Londero, INAF, Astronomic Observatory of Trieste – Italy
Collaborative Knowledge Discovery Environment on Biodiversity and Linguistic Diversity
The project aim is to establish a collaborative / team science workflow and enableknowledge discovery as well as experimental scholarship in biodiversity and linguistic diversity.Our first step towards this is to establish a working environment (workspace) for researchers to explore linguistic diversity and interconnection of languages and cultural artefacts / data in linguistic and biological domain.We aim to provide users of different domains and with several backgrounds (researchers of different disciplines, layman) with services/applications for workflows to discover, curate and interlink biological taxonomic data with linguistic/ terminological and cultural data, enrich and connect their data to external resources and publish them freely accessable on the web as open data.The project is connected to ongoing initiatves like the COST ENeL action (european network of electornic lexicography).
Presentation Eveline Wandl-Vogt, Ksenya Zaytseva, Davor Ostojic, OEAW-ACDH – Austria
Reproducible Automatic Speech Recognition workflows
The use-case proposed is specific for the rich community of Human Language Technologies users in South Africa. A template for Automatic Speech Recognition will be built into a web interface and the data it uses will be stored on Open Access Repository, the application is accessed via a Science Gateway. The user specifies their parameters and data on the web interface and submits the job to the Science Gateway which takes
care of the rest. gLibrary may be used to store some of the statistical results from the experiment.
Presentation – Final report David Risinamhodzi, Northwest University – South Africa
Implementation of eCulture Science Gateway – reloaded
The presentation regards the digital library “MuseiD-Italia”, which showcases images and metadata regarding Cultural Heritage in Italy. ICCU is trying to revamp the whole workflow in order to make it better, easier and faster, as well as adding new potential features, potentially looking at the integration of INDIGO solution. This is also intended to be a use case in order to expand experimentations in the next months to other (and bigger) ICCU-run or ICCU-led projects.
Presentation Luca Martinelli, ICCU – Italy
Intelligent Medical Image Analyzer
Presentation – Final report Benjamin Aribisala, Lagos State University – Nigeria
WEKA Machine Learning in Breast Cancer
The Wisconsin Breast Cancer datasets from the UCI Machine Learning Repository is used as a use case to classify benign and malignant samples using WEKA. The main task is to create a web interface to interact and use classification features of WEKA.
Presentation – Final report Stephan Mgaya, TERNET – Tanzania
Technology Transfer Alliance Collaboration Platform
The TTA Collaboration Platform is intended to be a web-based platform containing an integrated set of tools, applications, data repositories that are accessed via a portal: the TTA Portal. The motivation of developing this platform is to support collaboration and training and to foster education among the partners, sharing of all sorts of resources and dissemination of results. The platform will allow each partner to submit content such as project proposals, project documents, news update, information sharing via content lists and other kinds of content such as video or other multi-media contents that cover in a secure manner.
Presentation – Final report Diana Rwegasira, University of Dar es Salaam – Tanzania
iGrid – Smart Grid Capacity Development and Enhancement in Tanzania
Designing, implementing, demonstrating, testing and validating an autonomous solar-powered LVDC nanogrid prototype, serving an off-grid community of 10-100 households that can also be integrated in a higher voltage AC/DC grid if needed, as part of of a bigger strategy to ensure access to reliable and affordable electrical power supply to all communities (especially rural).
Presentation – Final report Aron Kondoro, University of Dar es Salaam – Tanzania
WIMEA–ICT: Science Gateway for Weather Information Management in East Africa to interact with ICT Tool WRF
Accessing and interacting with different applications/tools running on remote High Performance Computing (HPC) facilities is a challenge to most of researchers, scientists and students particularly in East Africa when it come that there is no graphical user interface (GUI). Many users are not familiar with Linux environment commands instead they demand to use GUI over windows machine in which again most scientific applications cant be deployed like in Linux machine. Weather Research and Forecasting (WRF) is a selected tool in the project improving Weather Information Management in East Africa through ICT tools. This use case is targeting to implement the Science Gateway (web portal) for easiness interacting remotely with WRF tool on HPC. The development will be on integrated open source tools: Future Gateway (FG) and Application Programming Interfaces (APIs).
Presentation – Final report Damas Makweba, Dar es Salaam Institute of Technology – Tanzania
Public Health Gateway in Kenya
A gateway for public health experts to publish content addressing issues to do with prevalence of fatalities arising from motorcycle accidents in Kenya is urgently needed. Whereas these fatalities are many, and frankly, avoidable, the persistence of this problem is worrying, and is a drain to the economy, and a real problem to families.
There are other severe concerns such as data analysis on immunisation and the effects of not taking up immunisation of children. All these societal concerns may have a wealth of information that can be disseminated for public consumption, with the potential to change things for the better, yet this is far from achieved in Kenya. The gateway will use a virtual storage model where the hypervisor provides an emulated hardware for each hardware environment for each virtual machine, including computer, memory and storage.
Presentation – Final report Dennis Muoki Kimego and Charles Muiruri Njaramba, Egerton University – Kenya
Universitat Politecnica de Valencia – Spain
Brunel University London – UK
INFN – Italy
Software Engineering Italia – Italy
INFN – Italy
INFN – Italy
University of Catania – Italy
Italia – Italy
CMCC – Italy
DONVITO, Giacinto INFN – Italy
INFN – Italy
INAF, Astronomic Observatory of Trieste – Italy
MARCUCCI, Nicola M.
INGV – Italy
IFCA – Spain
ICCU – Italy
PSNC – Poland
University of Catania – Italy
University of Dar es Salaam – Tanzania
INAF, Astrophysical Observatory of Catania, Italy
University of Catania – Italy
In you have any enquiries regarding the e-Research Summer Hackfest, feel free to contact us at firstname.lastname@example.org
1A Science Gateway is usually a tool which enables the members of Virtual Research Communities to access relevant applications and tools deployed on geographically distributed e-Infrastructures. For a definition, see the XSEDE web pages.