Some details about the Rayyan's classifier:
An important feature of the Rayyan app is in its ability to learn from users’ decisions to include or exclude studies which can then be used to build a model that would allow suggestions to be offered on studies that are awaiting screening. More specifically, after removing stop words and stemming the remaining words from the title and abstract, Rayyan extracts all the words (unigrams) and pairs of words (bigrams) and previously computed MeSH terms. These are then used as features by an SVM classifier. As users continue to label citations to studies as excluded or included (in the current setting, Rayyan needs 50 decisions with at least five included and five excluded studies), Rayyan calls the SVM classifier which learns the features of these excluded and included citations and builds a model, or classifier, accordingly. The classifier then runs on the citations that await labeling and obtains a score of how close each study matches the include and exclude classes. That score is then turned into a five star rating system that is presented to the user. As the user continues to label more citations, if Rayyan believes it can improve its prediction then it will use these new labeled examples to produce a new model and then run it on the remaining non labeled citations. This process is repeated until there are no more citations to label or the model cannot be improved any further.
The SVM classifier was tested using the above features on a collection of systematic reviews which were included in a study which had evaluated whether automated classification of citations can help in reducing the workload of systematic review authors. In the study, test collections were built for each of 15 review topics which had been conducted by the Oregon EPC, Southern California EPC, and Research Triangle Institute/University of North Carolina. A machine learning–based classifier was then trained on the test collections. The ratio of included articles ranged from 0.5% to 21.7%, with the largest review containing 3465 studies and the smallest 310.
A 2-fold cross validation was used with 50% of the data going to training and 50% to testing. This process was repeated ten times and the results were averaged. Two metrics were sought for the evaluation of the quality of the classifier. The Area Under the ROC Curve (AUC) and Work Saved over Random Sampling measured at 0.95 recall (WSS@95). WSS@95 refers to the percentage of studies that the reviewers do not have to go through because they have been screened out by the classifier at a recall of 0.95. The results are AUC = 0.87±0.09 and WSS@95= 0.49±0.18. While these data illustrate appreciable quality and time-saving, it is important to keep in mind that Rayyan offers much more time-savings because of all the facets, filtering features, and visual cues which help with expediting the screening process. A more detailed explanation of the machine learning process has been previously reported. Note that in this article, we used a different classifier, namely Random Forest, with more features such as co-citations. The latter features turned out to be hard to obtain in a production system such as Rayyan.