Bashir: Estimating retrievability ranks of patent documents using document features

Dr. Shariq Bashir of National University of Computer and Emerging Sciences, Islamabad, has published Estimating retrievability ranks of documents using document features, Neurocomputing 123(10), 216-232 (2014).

Here is the abstract:

Retrievability is a measure of access that quantifies how easily documents can be found using a retrieval system. Such a measure is of particular interest within the recall oriented retrieval domains such as patent or legal retrieval. This is because if a retrieval system for these retrieval domains makes some documents hard to find then professional searchers would have a difficult time when retrieving these documents. One main limitation of retrievability analysis is that it depends upon the processing of exhaustive number of queries. This requires large processing time and resources. In order to handle this problem, in this paper we use document features based approach in order to estimate the retrievability ranks of documents. In experiments, the strong correlation between features and retrievability scores on different collections confirms that it is possible to estimate the retrievability ranks of documents without processing queries. One major advantage of this approach is that it requires fewer resources, and can be computed more quickly as compared to query based approach. While, on the other hand, one major disadvantage of this approach is that it can only estimate the retrievability ranks of documents, but cannot calculate how much there is retrievability inequality (retrieval bias) between the documents of collection.

The author’s models are tested in four datasets, including two U.S. patent datasets:

USPTO Patent Collections: These collections are downloaded from the freely available US patent and trademark office website. We collect all patents that are listed under the United States Patent Classification (USPC) classes 433 (Dentistry), and 422 (Chemical apparatus and process disinfecting, deodorizing, preserving, or sterilizing). These collections consist of 64,986 documents, with 36,998 documents in USPC Class 422 and 27,988 documents in USPC Class 433. […]

This entry was posted in Applications, Articles and papers, Research findings and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s