Meet the Microtask Generator – Diff

March 23, 2026

By: admin


Introduction

Imagine you’re organizing a Wikimedia campaign. Your community has spent months identifying the knowledge gaps that matter most in your language. You’ve created a list of articles that need improvement. You have motivated editors who genuinely want to help. Then someone asks the simple but difficult question: “Where do we start?” If you tell a newcomer, “Here are 1,000 articles. Go improve them,” it can feel overwhelming. That is not a clear task. It is just a very long list with no obvious first step.

A campaign needs the same kind of clarity. Not just a big goal, but a clear starting point. When people know what they are being asked to do, they are much more likely to contribute and to come back! That is exactly the problem we wanted to solve.

Through an Outreachy internship, Silvia Gutiérrez, Isaac Johnson, Stephane Bisson, and an incredible intern, Mercy Oyelakin, designed a tool that turns long article lists into clear, manageable tasks, aaaand, it works on mobile!

Tool demo animation
The Micro Task Tool in Action

In this post, we will share a bit on who developed the tool, what communities are saying about it, how it works, and what comes next.

Meet the Developer: Mercy Oyelakin

Hi everyone, I’m Mercy, a Software Developer with a Computer Engineering background. I’m proficient in developing web-based applications, integrating APIs, and contributing to open-source projects.

I’m currently rounding up my internship at the Wikimedia Foundation through the Outreachy program. During the internship, I built the Microtask Generator, with support and guidance from my mentors, that helps Wikipedia communities turn their important article lists into potential recommended editing tasks. The project pushed me further than anything I’ve built before. And I’m so thankful for that. I worked with multiple data sources, including LiftWing for quality prediction, country and topic data, and the MediaWiki API for article metadata, which were crucial for prioritizing which articles to work on. It strengthened my technical skills in Python, JavaScript, JavaScript libraries, jQuery, HTML, and CSS, as well as API integration. It also strengthened my ability to solve problems, collaborate, iterate, and adapt, as well as my user-centered design, especially after working with users of the tool. I’m incredibly grateful for my wonderful mentors’ Silvia Gutiérrez, Isaac Johnson, and Stephane Bisson kind guidance, help, and support throughout the internship.

What Communities Are Saying:

“The tool is very intuitive. It allows us to make quick inquiries for event organizers. It’s very relevant, especially for people just starting out and not sure how to contribute; these are important clues.” — Member of the Mexican Chapter 🇲🇽

“Thanks for creating the tool! It feels so useful to find these tasks and get recommendations with such simple steps.” — Punjabi community member 🇮🇳

“Everything is easy, even for beginners this will help people know which parts they should work on improving” — Madurese organizer 🇮🇩

Description of the Tool

The Microtask Generator is an article improvement tool that analyzes Wikipedia articles, identifies content quality gaps, and suggests tasks for editors to address. This could, for example, be “add an infobox,” “add more references,” or “add more images.” The tool integrates Wikimedia APIs and LiftWing machine learning models to augment task lists, which can be filtered and exported by Wikimedia organizers.

How It Works:

The tool provides two input modes: article lists (up to 1000 articles) and Wikipedia categories. Results are displayed in an interactive table with filtering, sorting, pagination, and export options. Here’s how everything works step by step

Article list input

  1. Select “Get Recommendations From Articles” at the top of the form.
  2. Enter a “language code” in the first field (e.g., en for English, es for Spanish).
  3. Paste one or more “Wikipedia article titles” into the text area, one title per line.
  4. Click “Get Recommended Tasks”. The tool will fetch and analyze each article in real time :

💡 Cool tip for edit-a-thons: After your event, run the list of newly created articles through the tool. Copy the table with the quality column—it gives you:

  • A quick overview of how complete the articles are
  • Which ones still need work and could be improved next
Article list input screenshot
Results table screenshot

Recommendations from Wikipedia categories

  1. Select “Get Recommendations From Category” at the top of the form.
  2. Enter a “language code” (see step 2 above)
  3. Begin typing a Wikipedia category name (autocomplete suggestions will appear as you type). Categories can be found at the bottom of Wikipedia articles on the web or under the “Categories” section in the app
Category input screenshot

Here’s how you find categories in Desktop

And here’s how you do it in the Android App (SEgt-WMF cc by 4.0 😊). The iOS App doesn’t have this feature yet, but if you’re interested in it, you can vote for it to be implemented in T316082

  1. Set the “number of articles” to retrieve from that category (default: 20).
  2. Click “Get Articles From Category” to fetch and analyze articles in bulk.

Filtering results

Three dropdown filters appear above the results table: Tasks, Topics, and Geography. And this is how they work:

  • Filter by Tasks: someone teaching an edit-a-thon can filter by the exact skill they’re covering—whether it’s adding references, uploading media, or fixing article structure.
Tasks filter screenshot
  • Filter by Topics: Topic classifications are generated using machine learning models hosted on LiftWing and are based on article content signals. This allows organizers to pull articles from specific domains—such as politics or STEM—from large article lists. Perfect for tailoring edit-a-thons to what participants actually want to work on.
Topics filter screenshot
  • Filter by Countries: Articles are classified according to country relevance using predictive models and metadata signals. This feature is particularly useful for regional campaigns, local Wikimedia affiliates, or education programs focused on specific countries. For example, a user participating in a national editing drive can filter tasks to view only articles associated with their country.
Countries filter screenshot
  • Multiple filters can be applied simultaneously. The All/None button within each filter toggles all options at once.
All none toggle screenshot

Ordering the Table

All generated microtasks are displayed in a sortable table that supports prioritization using different impact indicators. These are the ways you can sort the table:

  • By page views: This allows contributors to prioritize high-visibility articles that are read frequently by the public. Improvements to these articles can have an immediate, broad impact on readers.
  • By the number of language editions: Shows the number of articles’ cross-language presence, helping organizers focus on content that has global relevance or that has been neglected by other language communities.
  • Days since last edit: This sorts articles from most to least recently edited. It helps users identify potentially neglected pages that may need renewed attention—often good options for newcomers, as these tend to be less contentious. Conversely, some organizers may prefer to focus on articles currently attracting the most editing activity.

Understand the Tasks

Each task type represents a specific way to improve an article. Click on any task name in the tool to see a brief explanation. Here’s what each one means in more detail:

TaskWhat it meansWhy it mattersWhere to learn more
📚 Add more referencesThe article may contain sentences without citations. This task focuses on checking these claims and adding appropriate sources.Verifiable sources are the foundation of Wikipedia. Adding citations helps readers trust what they’re reading and ensures information can be checked. ([[Wikipedia:Verifiability]])🎥Video: Adding citations tutorial (8 min) — Learn to add citations from books and websites using your sandbox
🎥Bonus video: What is a reliable source?
[[Help:Referencing for beginners]]
🔗 Add more internal wikilinksLink relevant words or phrases to other Wikipedia articles.Research shows that readers who explore Wikipedia through links often have longer, more diverse sessions—discovering topics they hadn’t initially set out to find. (Singer et al., 2017). You can be the one who starts someone else’s wiki rabbit hole (yes, there’s a Wikipedia page about it!)🎥 Adding links tutorial (5 min) – Learn to use the editing toolbar and add internal/external links
🏫 Full course: Introduction to Wikipedia, Section 6: Adding Links
🏋️ Practice your link muscle with this Growth Feature in the Newcomer Homepage
📓 [[Help:Links]]
📰 Improve article section headingsAdd or reorganize section headings to make the article easier to read and navigate.Well-structured articles help readers find information quickly and make it easier to expand content.📓 [[Help:Section]]
🔍 [[Wikipedia:Manual of Style/Layout]]
Find section ideas from similar articles using Isaac Johnson’s Maybe Add This? to see what sections appear in high-quality related articles.
Search for 🔨 to learn how to use this tool!
🖼️ Add images or other mediaUpload or link relevant images, diagrams, or other media.Research shows that images in association with text help support learning and understanding, especially when carefully curated and positioned (Rama et al., 2022).📓[[Help:Pictures]]
[[Commons:Commons:Upload]]
Practice with Growth’s Add an Image feature in your Newcomer Homepage
🗂️ Add more relevant categoriesAdd appropriate Wikipedia categories at the bottom of the article.Categories help readers discover related content and organize knowledge thematically. Another way to go into a wiki rabbit hole! 🐰📓[[Help:Category]]
[[Wikipedia:Categorization]]
Same as with sections, you can use Isaac’s Maybe Add This? to see what categories appear in high-quality related articles.
Search for 🔨 to learn how to use this tool!
🐡 This article is too short. Try to expand the contentThe article is very short. Try expanding it with more detail, context, or background information.Stubs leave readers with unanswered questions. Expanding them builds a more complete encyclopedia.[[Wikipedia:Stub]]
Also, if you speak another language, you might find a longer article in that version of Wikipedia, and you can translate sections!
🩺 Check article for a maintenance messageThis article has a maintenance banner (e.g., missing information, very long, or unreferenced). Review and address the issue flagged.Banners flag specific problems. Resolving them improves quality and removes barriers to article improvement.Click the banner to understand the specific issue and the banner’s documentation page
[[Wikipedia:Maintenance]]
💡 Add an infoboxAdd a standardized infobox template to summarize key facts at the top of the article.Infoboxes give readers quick facts at a glance and create a consistent presentation across related articles. But careful not all wikis allow you to do this and not all pages require an infobox![[Help:Infobox]]
[[Wikipedia:List of infoboxes]]
Find similar articles to copy template structure

A Note on Task Context

Not every task applies to every wiki. Some communities have guidelines that discourage infoboxes for certain article types (like biographies on German Wikipedia). That’s why organizers have full control over which task types appear. You can toggle off anything that doesn’t fit your community’s norms.

Tool demo animation 2

The tool is designed to support your wiki’s practices, not override them.

Detailed Individual Task Progress

Clicking on any row in the results table shows a more detailed breakdown of that article’s quality assessment. The eight quality feature metric cards are displayed, each showing a labeled progress bar and percentage score. This view makes it easy to see exactly which areas are pulling down an article’s overall Quality Progress score and where editing effort will have the most impact.

Detailed task progress screenshot

The Technologies Behind the Tasks:

The Microtask Generator connects three core Wikimedia infrastructure systems to generate its recommendations.

  • MediaWiki API: retrieves the latest article revision in real time, which was used to calculate how many days have passed since the last edit and how many language editions the article exists in. Furthermore, it was used to resolve redirects, verify page existence, and retrieve article lists during category processing.
  • Wikimedia Pageviews API: This API retrieves page view counts for articles, powering the pageview sorting feature.
  • LiftWing API: the Wikimedia Foundation’s machine learning model-serving platform. This API does the analytical heavy lifting through three inference endpoints:
    • Quality model: A language agnostic model that evaluates each article’s structure, citation coverage, section completeness, and other quality signals, then outputs predictions about where the article falls short. These predictions are translated into the specific, human-readable microtasks shown in the interface.
    • Topics model: Classifies articles into subject domains based on content signals
    • Countries model: Predicts geographic associations for articles.

The result is that every recommendation in the tool is grounded in live Wikipedia data. Not a static snapshot, but the actual current state of the article at the moment you run it.

Exporting and Copying

Results can be exported/downloaded in three formats via the Export button:

  • CSV: For use in spreadsheet tools such as Excel or Google Sheets.
  • TSV: Tab-separated format, suitable for databases or wiki-import scripts.
  • Wiki Table: A formatted wikitable ready to be downloaded and pasted into any Wikipedia page.
Export options screenshot

The table results can also be copied to a Wiki Table Format and pasted on the user’s talk page on Wikipedia. This can help users plan their edit activities and track their progress over time through the progress bar.

Wiki table format screenshot

What’s Next?

This was a three-month experiment, but we’re thrilled to see such interest from communities! The whole purpose of this tool is to test features that could eventually be integrated into tools the Connection Team (like Collaborative contributions), Growth Team (like structured tasks), and Editing Team (like edit suggestions) are building.

The best place to share feedback is on the tool’s Research talk page—that’s where we’ll be collecting ideas for future iterations.

We’ve heard requests for localization and translation, and while we don’t currently have the capacity to add this, there are two ways to work around it:

  • 🌐 Use Google Translate on the page—the tool itself works for all language Wikipedias!
  • 📝 Help translate this page and its documentation into your language—contributions welcome!

The tool is open source, and the roadmap is yours to shape. Try it, break it, and tell us what’s missing. Your feedback will help shape the next generation of tools for organizers and editors.

About the authors

Mercy is a software developer focused on community-driven tools.

Isaac is a senior research scientist who has worked on many recommendation systems to provide editor communities with tools to help close knowledge gaps on Wikimedia projects.

Silvia leads the Vital Knowledge research, working with communities to define and close knowledge gaps. This tool was built in direct response to community needs identified through that research

🔨 Bonus Content: How to use Isaac’s Maybe Add This? Tool: Replace pt with your language code (e.g., en, pa, te, id) and Eliana Alves Cruz with your article title. The tool will suggest sections or categories that appear in similar articles but may be missing from yours—perfect for spotting improvement opportunities!

Example of what you’ll see:

{
  "already-present": false,
  "pages-using": 6,
  "rec": "Obras",
  "tf-idf": 0
}

This means:

    • rec: “Obras” → The suggested section/category is “Obras” (Works)
    • already-present: false → Your article doesn’t have this yet (good candidate to add!)
    • pages-using: 6 → 6 out of the similar articles analyzed include this section
    • tf-idf: 0 → A statistical measure of uniqueness (higher numbers = more distinctive to this topic)

What to do: Start with suggestions that have high page-use counts and are missing—they’re common in similar articles but missing from yours!

Can you help us translate this article?

In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?



Source link

Leave a Comment