An Information Retrieval Approach for Automatically Constructing Software Libraries
Yoelle S. Maarek; Daniel M. Berry; Gail E. Kaiser
- An Information Retrieval Approach for Automatically Constructing Software Libraries
Maarek, Yoelle S.
Berry, Daniel M.
Kaiser, Gail E.
- Technical reports
- Computer Science
- Permanent URL:
- Columbia University Computer Science Technical Reports
- Part Number:
- Department of Computer Science, Columbia University
- Publisher Location:
- New York
- Although software reuse presents clear advantages for programmer productivity and code reliability, it is not practiced enough. One of the reasons for the only moderate success of reuse is the lack of software libraries that facilitate the actual locating and understanding of reusable components. This paper describes a technology for automatically assembling large software libraries which promote software reuse by helping the user locate the components closest to her/his needs. Software libraries are automatically assembled from a set of unorganized components by using information retrieval techniques. The construction of the library is done in two steps. First, attributes are automatically extracted from natural language documentation by using a new indexing scheme based on the notions of lexical affinities and quantity of information. Then a hierarchy for browsing is automatically generated using a clustering technique which draws only on the information provided by the attributes. Thanks to the free-text indexing scheme, tools following this approach can accept free-style natural language queries. This technology has been implemented in the GURU system, which has been applied to construct an organized library of AIX utilities. An experiment was conducted in order to evaluate the retrieval effectiveness of GURU as compared to INFOEXPLORER a hypertext library system for AIX 3 on the IBM RISC System/6000 series. We followed the usual evaluation procedure used in information retrieval, based upon recall and precision measures, and determined that our system performs 15% better on a random test set, while being much less expensive to build than INFOEXPLORER.
- Computer science
- Item views: