Topics
Last updated
Was this helpful?
Last updated
Was this helpful?
Works in OpenAlex are tagged with Topics using an automated system that takes into account the available information about the work, including title, abstract, source (journal) name, and citations.
There are around 4,500 Topics. Topics are grouped into subfields, which are grouped into fields, which are grouped into top-level domains. This is shown in the diagram below, along with the counts for each.
Works are assigned topics using a model that assigns scores for each topic for a work. The highest-scoring topic is that work's "primary topic". Each topic has one subfield, one field, and one domain, so each of these may also be used to classify the work, depending on the level of granularity you want. For example:
Domain: "Health Sciences"
Field: "Medicine"
Subfield: "Health Informatics"
Topic: "Artificial Intelligence in Medicine"
Use a Large Language Model (LLM) to get labels and descriptions for these clusters.
Use this labeled data to train a deep-learning model that can assign topics using titles, abstracts, citations, and journal name.
This model can handle cases with missing data, so we can use it to classify most of our works, including new works that don't have any incoming citations.
Most works are assigned Topics—as well as domains, fields, subfields, and keywords—using the methods above. Some works, however, don't have enough associated data to be able to assign Topics. The following table taken from the methods paper (linked to above), shows how many works were classified with at least one Topic and how many works were excluded from Topic classification for various reasons:
We developed the method for classifying our works in collaboration with , extending the methods they used in their Open Leiden rankings, which they explain . Here is an outline of the overall method:
Cluster the for works that have incoming and outgoing citations. This provides meaningful clusters of works that strongly correspond to research communities focused on different topics.
Assign each topic to subfields, fields, and domains, which are based on Scopus's
For a detailed description of the methods, see our paper: . The code and model are available at .
You can find more information about how OpenAlex Topics are included in the API and snapshot data in our .