A call for papers — with submission deadline of 1 May 2013 — has been issued for DESI V: Workshop on Standards for Using Predictive Coding and Other Machine Learning Algorithms, to be held 14 June 2013 in Rome, Italy, following ICAIL 2013: International Conference on Artificial Intelligence and Law.
Papers addressing the following questions are invited for DESI V:
1) How transparent can and should the process be in sharing seed sets or training sets of documents with opposing parties, including the sharing of privileged documents?
2) What differences if any exist between seed sets developed through random sampling versus other forms of judgmental sampling (including picking seed documents using keywords)?
3) How are non-relevant documents used to optimize machine learning algorithms and should they be subject to similar disclosure?
4) Are there ways in which predictive coding and machine learning methods can be tuned to find highly relevant (“hot”) documents in large collections?
5) To what extent is metadata important in tuning predictive coding software to find similarity in documents?
6) In light of past research at the TREC Legal Track and elsewhere, are there absolute targets for metrics in recall and precision that could serve as standards in every case, or are achieving certain metrics dependent on the relevant data set and legal context?
7) What kinds of best practice standards are needed to help improve mutual understanding of what was actually done, and to improve overall “search quality”?
8) How should predictive coding techniques be audited in connection with an entity submitting itself to an ISO 9001 quality measurement process?
9) To what extent can and should machine learning approaches be used in other phases of the litigation process, to assist in aspects of the process such as identification, preservation, and collection?
10)What are the applications of predictive coding and other forms of machine learning in related “compliance” areas, including regulatory, enforcement, and investigations?
The workshop discussion will be grounded in the results of the recently completed TREC Legal Track, especially where supervised learning methods have shown promising results in terms of being able to more cost-effectively demonstrate rates of recall and precision that approximate the best that could be obtained through other methods, including exhaustive manual review.
For more details, please see the complete call.