Inferring Method Specifications from Natural Language API Descriptions

Rahul Pandita, Xusheng Xiao, Hao Zhong, Tao Xie, Steve Oney, and Amit Paradkar

Application Programming Interface (API) docu-ments are a typical way of describing legal usage of reusablesoftware libraries, thus facilitating software reuse. However,even with such documents, developers often overlook somedocuments and build software systems that are inconsistentwith the legal usage of those libraries. Existing softwareverification tools require formal specifications (such as codecontracts), and therefore cannot directly verify the legal usagedescribed in natural language text in API documents againstcode using that library. However, in practice, most librariesdo not come with formal specifications, thus hindering tool-based verification. To address this issue, we propose a novelapproach to infer formal specifications from natural languagetext of API documents. Our evaluation results show that ourapproach achieves an average of 92% precision and 93%recall in identifying sentences that describe code contracts frommore than 2500 sentences of API documents. Furthermore, ourresults show that our approach has an average 83% accuracyin inferring specifications from over 1600 sentences describingcode contracts.