TY - JOUR
T1 - Minersoft
T2 - Software retrieval in grid and cloud computing infrastructures
AU - Dikaiakos, Marios D.
AU - Katsifodimos, Asterios
AU - Pallis, George
PY - 2012/6
Y1 - 2012/6
N2 - One of the main goals of Cloud and Grid infrastructures is to make their services easily accessible and attractive to end-users. In this article we investigate the problem of supporting keyword-based searching for the discovery of software files that are installed on the nodes of large-scale, federated Grid and Cloud computing infrastructures. We address a number of challenges that arise from the unstructured nature of software and the unavailability of software-related metadata on large-scale networked environments. We present Minersoft, a harvester that visits Grid/Cloud infrastructures, crawls their file systems, identifies and classifies software files, and discovers implicit associations between them. The results of Minersoft harvesting are encoded in a weighted, typed graph, called the Software Graph. A number of information retrieval (IR) algorithms are used to enrich this graph with structural and content associations, to annotate software files with keywords and build inverted indexes to support keyword-based searching for software. Using a real testbed, we present an evaluation study of our approach, using data extracted from productionquality Grid and Cloud computing infrastructures. Experimental results show that Minersoft is a powerful tool for software search and discovery.
AB - One of the main goals of Cloud and Grid infrastructures is to make their services easily accessible and attractive to end-users. In this article we investigate the problem of supporting keyword-based searching for the discovery of software files that are installed on the nodes of large-scale, federated Grid and Cloud computing infrastructures. We address a number of challenges that arise from the unstructured nature of software and the unavailability of software-related metadata on large-scale networked environments. We present Minersoft, a harvester that visits Grid/Cloud infrastructures, crawls their file systems, identifies and classifies software files, and discovers implicit associations between them. The results of Minersoft harvesting are encoded in a weighted, typed graph, called the Software Graph. A number of information retrieval (IR) algorithms are used to enrich this graph with structural and content associations, to annotate software files with keywords and build inverted indexes to support keyword-based searching for software. Using a real testbed, we present an evaluation study of our approach, using data extracted from productionquality Grid and Cloud computing infrastructures. Experimental results show that Minersoft is a powerful tool for software search and discovery.
KW - Cloud computing
KW - Grid computing
KW - Resource management
KW - Software search engine
UR - http://www.scopus.com/inward/record.url?scp=84864867245&partnerID=8YFLogxK
U2 - 10.1145/2220352.2220354
DO - 10.1145/2220352.2220354
M3 - Review article
AN - SCOPUS:84864867245
SN - 1533-5399
VL - 12
JO - ACM Transactions on Internet Technology (TOIT)
JF - ACM Transactions on Internet Technology (TOIT)
IS - 1
M1 - 2
ER -