Minersoft: Software retrieval in grid and cloud computing infrastructures

Marios D. Dikaiakos*, Asterios Katsifodimos, George Pallis

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

12 Citations (Scopus)

Abstract

One of the main goals of Cloud and Grid infrastructures is to make their services easily accessible and attractive to end-users. In this article we investigate the problem of supporting keyword-based searching for the discovery of software files that are installed on the nodes of large-scale, federated Grid and Cloud computing infrastructures. We address a number of challenges that arise from the unstructured nature of software and the unavailability of software-related metadata on large-scale networked environments. We present Minersoft, a harvester that visits Grid/Cloud infrastructures, crawls their file systems, identifies and classifies software files, and discovers implicit associations between them. The results of Minersoft harvesting are encoded in a weighted, typed graph, called the Software Graph. A number of information retrieval (IR) algorithms are used to enrich this graph with structural and content associations, to annotate software files with keywords and build inverted indexes to support keyword-based searching for software. Using a real testbed, we present an evaluation study of our approach, using data extracted from productionquality Grid and Cloud computing infrastructures. Experimental results show that Minersoft is a powerful tool for software search and discovery.

Original languageEnglish
Article number2
JournalACM Transactions on Internet Technology (TOIT)
Volume12
Issue number1
DOIs
Publication statusPublished - Jun 2012
Externally publishedYes

Keywords

  • Cloud computing
  • Grid computing
  • Resource management
  • Software search engine

Fingerprint

Dive into the research topics of 'Minersoft: Software retrieval in grid and cloud computing infrastructures'. Together they form a unique fingerprint.

Cite this