OpenAlex help center
Go to OpenAlex Web
  • OpenAlex help center
  • Tutorial
  • About us
  • FAQ
  • Pricing
  • Coverage
  • About the data
    • 📄Works
    • 👩Authors
    • 📚Sources
    • 🏫Institutions
    • 💡Topics
    • 🗝️Keywords
    • 🏢Publishers
    • 💰Funders
    • 🌎Geo
    • 💡Concepts
  • How-tos
  • API and Data Snapshot
  • Events
    • Webinars
    • Open houses
    • User Meeting
Powered by GitBook
On this page
  • Topics coverage
  • Technical documentation

Was this helpful?

Export as PDF
  1. About the data

Topics

PreviousInstitutionsNextKeywords

Last updated 1 year ago

Was this helpful?

Works in OpenAlex are tagged with Topics using an automated system that takes into account the available information about the work, including title, abstract, source (journal) name, and citations.

There are around 4,500 Topics. Topics are grouped into subfields, which are grouped into fields, which are grouped into top-level domains. This is shown in the diagram below, along with the counts for each.

Works are assigned topics using a model that assigns scores for each topic for a work. The highest-scoring topic is that work's "primary topic". Each topic has one subfield, one field, and one domain, so each of these may also be used to classify the work, depending on the level of granularity you want. For example:

Example Topic: "Artificial Intelligence in Medicine"

  • Domain: "Health Sciences"

  • Field: "Medicine"

  • Subfield: "Health Informatics"

  • Topic: "Artificial Intelligence in Medicine"

The steps we used to assign Topics to works

  1. Use a Large Language Model (LLM) to get labels and descriptions for these clusters.

  2. Use this labeled data to train a deep-learning model that can assign topics using titles, abstracts, citations, and journal name.

    • This model can handle cases with missing data, so we can use it to classify most of our works, including new works that don't have any incoming citations.

Topics coverage

Most works are assigned Topics—as well as domains, fields, subfields, and keywords—using the methods above. Some works, however, don't have enough associated data to be able to assign Topics. The following table taken from the methods paper (linked to above), shows how many works were classified with at least one Topic and how many works were excluded from Topic classification for various reasons:

Technical documentation

We developed the method for classifying our works in collaboration with , extending the methods they used in their Open Leiden rankings, which they explain . Here is an outline of the overall method:

Cluster the for works that have incoming and outgoing citations. This provides meaningful clusters of works that strongly correspond to research communities focused on different topics.

Assign each topic to subfields, fields, and domains, which are based on Scopus's

For a detailed description of the methods, see our paper: . The code and model are available at .

You can find more information about how OpenAlex Topics are included in the API and snapshot data in our .

💡
CWTS at Leiden University
in this article
citation network
ASJC categories
"OpenAlex: End-to-End Process for Topic Classification"
https://github.com/ourresearch/openalex-topic-classification
technical documentation Topics pages