Skip navigation.
Home
Semantic Software Lab
Concordia University
Montréal, Canada

Feed aggregator

New releases bringing GATE and Python closer together

Release

The GATE Team is proud to announce two new releases that bring GATE and Python together:

  • Python GateNLP (version 1.0.2): a Python 3 package that brings many of the concepts and the ease of handling documents, annotations and features to Python.
  • GATE Python Plugin (version 3.0.2): a new plugin that can be used from Java GATE to process documents using Python code and the methods provided by the Python GateNLP package

Both releases are meant as first releases to a wider community to give feedback about what users need and what the basic design should look like. 

Feedback

Users are invited to give feedback about the Python GateNLP package:

  • If you detect a bug, or have a feature request, please use the GitHub Issue Tracker
  • For more general discussions, ideas, asking the community for help, please use (preferably) the GitHub Discussions Forum or the General GATE Mailing List
  • We are also interested in feedback about the API and the functionality of the package. If you want to use the package for your own development and want to discuss changes, improvements or how you can contribute, please use the GitHub Discussions Forum 
  • We are happy to receive contributions! Please create an issue and discuss/plan with developers on the issue tracker before providing a pull request.

To give feedback about the Python Plugin:

IMPORTANT: whenever you give feedback, please include as much detail about your Operating System, Java or Python version, package/plugin version and your concrete problem or question as possible!

GATE Course ModuleModule 11 of the upcoming online GATE course in February 2021 will introduce the Python GateNLP package and the GATE Python plugin. You can register for this and many other modules of the course here.
Python GateNLPPython GateNLP is a Python NLP framework which provides some of the concepts and abstractions known from Java GATE in Python, plus a number of new features: 
  • Documents with arbitrarily many features, arbitrarily many named Annotation sets. GateNLP also adds the capability of keeping a ChangeLog
  • AnnotationSets with arbitrarily many (stand-off) Annotations which can overlap in any way and can span any character range (not just entire tokens/words)
  • Annotations with arbitrarily many features, grouped per set by some annotation type name
  • Features which map keys to arbitrary values 
  • Corpora: collections of documents. Python GateNLP provides corpora that directly map to files in a directory (recursively). 
  • Prepared modules for processing documents. In GateNLP these are called "Annotators" and also allow for filtering, splitting of documents
  • Reading and writing in various formats. GateNLP uses three new formats, "bdocjs" (JSON serialization), "bdocym" (YAML serialization) and "bdocMP" (Message Pack serialization). Documents in that format can be exchanged with Java GATE through the GATE plugin Format_Bdoc
  • Gazetteers for fast lookup and annotation of token sequences or character sequences which match a large list of known terms or phrases
  • A way to annotate documents based on patterns based on text and other annotations and annotation features: PAMPAC
  • A HTML visualizer which allows the user to interactively view GATE documents, annotations and features as separate HTML files or within Jupyter notebooks.
  • Bridges to powerful NLP libraries and conversion of their annotations to GateNLP annotations:
  • GateWorker: an API that allows the user to directly run Java GATE from Python and exchange documents between Python and Java
  • The Java GATE Python Plugin (see below) allows the user to run Python GateNLP code directly from Java GATE and process documents with it.
GATE Python PluginThe GATE Python Plugin is one of many GATE plugins that extend the functionality of Java GATE. This plugin allows the user to process GATE documents running in the Java GATE GUI or via the multiprocessing Gate Cloud Processor (GCP) with Python programs (which use the GateNLP API for manipulating documents).
Categories: Blogroll

What is the Right Response to Employer Misbehavior in Research?

Machine Learning Blog - Mon, 2020-12-14 16:28

I enjoyed my conversations with Timnit when she was in the MSR-NYC lab, so her situation has been on my mind throughout NeurIPS.

Piecing together what happened second-hand is always tricky, but Jeff Dean’s account and Timnit’s agree on a basic outline. Timnit and others wrote a paper for FAccT which was approved for submission by the normal internal review process, then later unapproved. Timnit threatened to leave unless various details about this unapproval were clarified. Google then declared her resigned.

The definition of resign makes it clear an employee does it, not an employer. Since that apparently never happened, this is a mischaracterized firing. It also seems quite credible that the unapproval process was highly unusual based on various reactions I’ve seen and my personal expectations of what researchers would typically tolerate.

This frankly looks bad to me and quite a number of other people. Aside from the plain facts, this is also consistent with racism and/or sexism given the roles of those involved. Google itself now faces a substantial rebellion amongst employees.

However, I worry about consequences to some of these reactions.

  1. Some people suggest not reviewing papers from Google-based researchers. As a personal decision, this is making a program chair’s difficult job harder. As a communal decision, this would devastate the community since a substantial fraction are employed at Google. These people did not make this decision and many actively support Timnit there (at some risk to their job) so a mass-punishment approach seems deeply counterproductive.
  2. Others have suggested that Google should not be a sponsor at major machine learning conferences. Since all of these are run as nonprofits, the lost grants will either be made up by increasing costs for everyone or reducing grants to students and diversity sponsorship. Reduced grants in particular seem deeply counterproductive.
  3. Some have suggested that all industry research in general is bad. Industrial research varies substantially from place to place, perhaps much more so than in academia. As an example, Microsoft Research has no similar internal review process for publications. Overall, the stereotyping inherent in this view makes me uncomfortable and there are some real advantages to working in industry in terms of ability to concentrate on research or effecting real change.

It’s critical to understand that the strength of the research community is incredibly valuable to the community. It’s not hard to imagine a different arrangement where all industrial research is proprietary, with only a few major companies operating competitive internal research teams. This sort of structure exists in some other fields, often to the detriment of anyone other than a major company. Researchers at those companies can’t as easily switch jobs and researchers outside of those companies may lack the context to even contribute to the state of the art. The field itself progresses slower and in a more secretive way due to lack of sharing. Anticommunal acts based on mass ostracization or abandonment could shift our structure from the current relatively happy equilibrium where people from all over can participate, learn, and contribute towards a much worse situation.

This is not to say that there are no consequences. The substantial natural consequences of a significant moral-impacting event will play out regardless of anything else. The marketplace for top researchers is quite competitive so for many of them uncertainty about the feasibility of publication, the disposition and competence of senior leadership, or constraints on topics tips the balance towards other offers. That may be severe this year, since this all blew up as the recruiting season was launching and I expect it to last over many years unless some significant action is taken. In this sense, I expect all the competitors may be looking forward to recruiting more than they were previously and the cost of not resolving the conflict here in a better way may be much, much higher than just about any other course of action. This is not particularly hypothetical—I saw it play out over the years after the silicon valley lab was cut as the brain drain of other great researchers in competitive areas was severe for several years afterwards.

I don’t think a general answer to the starting question is possible, since it will always depend on circumstances. Even this instance is complex with actions that could cause unintuitive adverse impacts on unanticipated parts of our community or damage the community as a whole. I personally hope that the considerable natural consequences here form a substantial deterrent to misbehavior in the long term. Please think this through when considering your actions here.

Edits: tweaked conclusion wording a bit with advice from reshamas.

Categories: Blogroll

Experiments with the ICML 2020 Peer-Review Process

Machine Learning Blog - Tue, 2020-12-01 12:04

This post is cross-listed on the CMU ML blog.

The International Conference on Machine Learning (ICML) is a flagship machine learning conference that in 2020 received 4,990 submissions and managed a pool of 3,931 reviewers and area chairs. Given that the stakes in the review process are high — the careers of researchers are often significantly affected by the publications in top venues — we decided to scrutinize several components of the peer-review process in a series of experiments. Specifically, in conjunction with the ICML 2020 conference, we performed three experiments that target: resubmission policies, management of reviewer discussions, and reviewer recruiting. In this post, we summarize the results of these studies.

Resubmission Bias

Motivation. Several leading ML and AI conferences have recently started requiring authors to declare previous submission history of their papers. In part, such measures are taken to reduce the load on reviewers by discouraging resubmissions without substantial changes. However, this requirement poses a risk of bias in reviewers’ evaluations.

Research question. Do reviewers get biased when they know that the paper they are reviewing was previously rejected from a similar venue?

Procedure. We organized an auxiliary conference review process with 134 junior reviewers from 5 top US schools and 19 papers from various areas of ML. We assigned participants 1 paper each and asked them to review the paper as if it was submitted to ICML. Unbeknown to participants, we allocated them to a test or control condition uniformly at random:

Control. Participants review the papers as usual.

Test. Before reading the paper, participants are told that the paper they review is a resubmission.

Hypothesis. We expect that if the bias is present, reviewers in the test condition should be harsher than in the control. 

Key findings. Reviewers give almost one point lower score (95% Confidence Interval: [0.24, 1.30]) on a 10-point Likert item for the overall evaluation of a paper when they are told that a paper is a resubmission. In terms of narrower review criteria, reviewers tend to underrate “Paper Quality” the most.

Implications. Conference organizers need to evaluate a trade-off between envisaged benefits such as the hypothetical reduction in the number of submissions and the potential unfairness introduced to the process by the resubmission bias. One option to reduce the bias is to postpone the moment in which the resubmission signal is revealed until after the initial reviews are submitted. This finding must also be accounted for when deciding whether the reviews of rejected papers should be publicly available on systems like openreview.net and others. 

Details. http://arxiv.org/abs/2011.14646

Herding Effects in Discussions

Motivation. Past research on human decision making shows that group discussion is susceptible to various biases related to social influence. For instance, it is documented that the decision of a group may be biased towards the opinion of the group member who proposes the solution first. We call this effect herding and note that, in peer review, herding (if present) may result in undesirable artifacts in decisions as different area chairs use different strategies to select the discussion initiator.

Research question. Conditioned on a set of reviewers who actively participate in a discussion of a paper, does the final decision of the paper depend on the order in which reviewers join the discussion?

Procedure. We performed a randomized controlled trial on herding in ICML 2020 discussions that involved about 1,500 papers and 2,000 reviewers. In peer review, the discussion takes place after the reviewers submit their initial reviews, so we know prior opinions of reviewers about the papers. With this information, we split a subset of ICML papers into two groups uniformly at random and applied different discussion-management strategies to them: 

Positive Group. First ask the most positive reviewer to start the discussion, then later ask the most negative reviewer to contribute to the discussion.

Negative Group. First ask the most negative reviewer to start the discussion, then later ask the most positive reviewer to contribute to the discussion.

Hypothesis. The only difference between the strategies is the order in which reviewers are supposed to join the discussion. Hence, if the herding is absent, the strategies will not impact submissions from the two groups disproportionately. However, if the herding is present, we expect that the difference in the order will introduce a difference in the acceptance rates across the two groups of papers.

Key findings. The analysis of outcomes of approximately 1,500 papers does not reveal a statistically significant difference in acceptance rates between the two groups of papers. Hence, we find no evidence of herding in the discussion phase of peer review.

Implications. Regarding the concern of herding which is found to occur in other applications involving people, discussion in peer review does not seem to be susceptible to this effect and hence no specific measures to counteract herding in peer-review discussions are needed.

Details. https://arxiv.org/abs/2011.15083

Novice Reviewer Recruiting

Motivation.  A surge in the number of submissions received by leading ML and  AI conferences has challenged the sustainability of the review process by increasing the burden on the pool of qualified reviewers. Leading conferences have been addressing the issue by relaxing the seniority bar for reviewers and inviting very junior researchers with limited or no publication history, but there is mixed evidence regarding the impact of such interventions on the quality of reviews. 

Research question. Can very junior reviewers be recruited and guided such that they enlarge the reviewer pool of leading ML and AI conferences without compromising the quality of the process?

Procedure. We implemented a twofold approach towards managing novice reviewers:

Selection. We evaluated reviews written in the aforementioned auxiliary conference review process involving 134 junior reviewers, and invited 52 of these reviewers who produced the strongest reviews to join the reviewer pool of ICML 2020. Most of these 52 “experimental” reviewers come from the population not considered by the conventional way of reviewer recruiting used in ICML 2020.

Mentoring. In the actual conference, we provided these experimental reviewers with a senior researcher as a point of contact who offered additional mentoring.

Hypothesis. If our approach allows to bring strong reviewers to the pool, we expect experimental reviewers to perform at least as good as reviewers from the main pool on various metrics, including the quality of reviews as rated by area chairs.

Key findings. A combination of the selection and mentoring mechanisms results in reviews of at least comparable and on some metrics even higher-rated quality as compared to the conventional pool of reviews: 30% of reviews written by the experimental reviewers exceeded the expectations of area chairs (compared to only 14% for the main pool).

Implications. The experiment received positive feedback from participants who appreciated the opportunity to become a reviewer in ICML 2020 and from authors of papers used in the auxiliary review process who received a set of useful reviews without submitting to a real conference. Hence, we believe that a promising direction is to replicate the experiment at a larger scale and evaluate the benefits of each component of our approach.

Details. http://arxiv.org/abs/2011.15050

Conclusion

All in all, the experiments we conducted in ICML 2020 reveal some useful and actionable insights about the peer-review process. We hope that some of these ideas will help to design a better peer-review pipeline in future conferences.

We thank ICML area chairs, reviewers, and authors for their tremendous efforts. We would also like to thank the Microsoft Conference Management Toolkit (CMT) team for their continuous support and implementation of features necessary to run these experiments, the authors of papers contributed to the auxiliary review process for their responsiveness, and participants of the resubmission bias experiment for their enthusiasm. Finally, we thank Ed Kennedy and Devendra Chaplot for their help with designing and executing the experiments.

The post is based on the works by Ivan Stelmakh, Nihar B. Shah, Aarti Singh, Hal Daumé III, and Charvi Rastogi.

Categories: Blogroll
Syndicate content