Technology Committee/Project requests/WikiRate - rating Wikimedia

From Wikimedia UK
Jump to navigation Jump to search

This is a working page. Please feel free to edit it directly.

Overview slides from WikiRate talk at WMCON2014 by Michael Maggs

Objective

There is an increasing desire, and need, for Wikimedia editors, chapters and other groups to be able to measure quantitatively the impact of the work they are doing. There is a growing understanding and agreement that SMART targets are in many cases the best and most reliable approach, but such targets are very difficult to define when we try to measure improvements in the quality of Wiki articles or uploaded media files. Improvements in quantity can easily be measured; improvements in quality are much harder.

To date, most attempts to measure article quality have relied on manual evaluation by individuals. Such approaches will always be needed, and will always represent the 'gold standard' of quality evaluation, but they suffer from two very significant drawbacks: expense and lack of scalability.

We suggest that there is a need for a tool or series of tools that can measure article and media file quality in an automated way. The output of such an automated tool will always be far inferior to manual evaluation in any particular case, but when applied to a large corpus may nevertheless provide very illuminating data on overall quality. By measuring how comparable outputs vary over time we can get at least some measure of the impact that a chapter or other group is having as it tries to improve overall article and media file quality.

Measuring quality via a series of numerical metrics would make this extremely important facet of Wikimedia amenable to SMART analysis for the first time.

Goals

Provide access to a range of quality metrics for Wikimedia articles and media files. Allow users to obtain automated quality ratings without the need for expensive manual expert analysis.

Primary users are expected to be:

  • Chapters and other Wikimedia entities who need to ways to measure their own impacts, and those of their volunteers, on article and media file quality
  • WikiProjects and online groups who want automatically to track the quality of articles they are interested in
  • Educationalists
  • Individual editors
  • Academic and Wiki researchers
  • The Foundation's analytics teams

Outcomes

Provide and maintain for the benefit of the Wikimedia community and beyond an easy to use toolset to provide automated numerical metrics of the quality of Wikimedia content, including mainspace articles and media files.

Creation of the model

Measuring the quality of articles and media files automatically without context-specific human knowledge is a technically hard problem, and needs a model which is able to output useful measures of quality based entirely on programmatically determined inputs. A simplified procedure for generating such as model might be as follows:

  • 1. Decide what is meant by 'quality' and think about possible numerical output scores
  • 2. On a subset of target articles/media files, determine base scores manually (eg using experts to rate quality)
  • 3. Review all possible inputs (useful things that can be determined programmatically)
  • 4. Use machine learning techniques such as support vector machine or random forest analyses to determine which functions, when applied to the inputs, best predict outputs that match the manually-derived base scores
  • 5. Iterate over 1-4 as needed to create a robust theoretical model that is able to provide outputs that are usefully indicative of quality

Implementation of the model

Once the model has been created, it can be used to provide numerical output scores for other targets that have not been manually rated. The model can be repeatedly run, or run against old versions of a target article/file, to provide estimates of how quality varies with time.

Possible model inputs

Some of these possible inputs may contribute positively to quality, others negatively, and others perhaps very little at all. The manner and extent to which each contributes to the eventual output can be automatically calculated as part of the model generation analyses and does not need to be manually defined. So, this is just a list of potential inputs, not necessarily ones that will turn out to be useful.

Possible inputs for article quality measures

  • Community-added tags:
  • WikiProject quality assessment scores
  • Categorisations, such as stub, start article etc
  • Awards (Featured article, Good Article)
  • Manually-added problem templates
  • Stability over time
  • User edits
  • Bot edits
  • Extent of edit warring
  • Length of text
  • Number of references
  • Number of references per 100 words of text
  • Links in
  • Links out
  • Obvious spelling and grammatical errors, and their persistence
  • Readability

Possible inputs for media file quality measures

Some of these examples apply to photos only. Other file types such as svgs will need different measures.

  • Awards (Featured Image/Picture status on Commons and on Wikipedias; Quality Image; Valued Image)
  • Metadata
  • Camera type
  • Evidence of pre-processing before upload (Photoshop etc)
  • Usage on main spaces of other Wikimedia projects
  • Main article image
  • Used in a list
  • Used in a template
  • Sharpness measured over the image
  • Image size (pixel count)
  • Image size on disk
  • Categorisations (comprehensiveness, specificity)

Constraints

The chapter has neither the personnel nor the funding to start on this immediately. Funds would be needed both in the initial stage and also for long-term hosting/maintenance/improvements of the tools for the benefit of the community.

Funds

Assuming agreement that this is considered worth pursuing, funds would need to be sought (a task which which would in itself require fundraiser time and expertise). Options might include:

  • WMF via special-purpose grants
  • Direct approaches to WMUK to corporate or other sponsors (eg Google)
  • Seeking grants via competitive tender/open competitions.

People

It may be possible, although by no means certain, that the analytical and machine learning work to create the basic models could be done without the need for direct payment (it could be a fairly interesting series of research projects, and may be of interest at Kaggle). However, we would expect that the programming work then needed to turn the research projects into a user-friendly set of tools will require contracted or employed programmers.

To create and maintain/improve the models:

  • Academics
  • Data analysts
  • Wiki researchers
  • Programmers
  • Subject matter experts

To create and maintain/improve a user-friendly toolset:

  • A lead volunteer team
  • Programmers
  • Wikimedia editors and potential users, including chapter and other Wikimedia entity input
  • Admin and backup services, both technical and non-technical, including volunteer and partner management
  • Product management (looking after the programmers)

Risks

Ultimately, this project has the potential to deliver a sophisticated set of widely-used tools to the Wikimedia communities and beyond. However, the costs and risks will rise in proportion to our ambitions, and for that reason we think it sensible to start small and build capacity/sophistication over time, based on the actual needs and desires of the Wikimedia communities. Fortunately, the project should be perfectly amenable to that approach, which means we do not need to decide on a set of all-or-nothing targets now.

That said, even to start on the path will be a fairly major undertaking as it will require the chapter to ramp up its technical (IT) capabilities. At the very least, it is likely require a commitment to fund at least one technical project manager and one or more full-time programmers, either as employees or on a fairly long-term contract basis. It will require the chapter to change its mindset, and its technical ambitions, and start thinking of itself as an entity that is able to provide some elements of leadership on IT for the movement as a whole. On that basis the chapter may want to look at building a sufficient technical capability to run several higher-impact IT projects than it has attempted in the past (eg accessibility?) to ensure efficient use of programmer resources.

Related work

Although some significant research has already been carried out, which we hope to be able to build on and/or integrate, many of the potentially useful results languish in published academic articles that are unknown to most within the Wikimedia community. There has been little concerted effort so far to make use of the academic results to build practical tools that will help Wikimedians and Wikimedia entities on a day to day basis.

This list may not be complete, and represents just the start of a more comprehensive literature search that will be needed. Please add any additional work that you know of.

Tools and research on automation

Quality in general

Specific characteristics that may relate to quality

Quantity

Research based on manual analysis

Background

Discussion

At this stage this is very much a high-level proposal for discussion, and feedback on desirability and viability is probably more important than feedback on the technical detail.

Please discuss on the talk page, or just improve the text above.


Endorsements