Parse datasets exported from OpenAlex in two ways:
(1) a CSV file exported in the browser, or
(2) a data frame obtained via the {openalexR} API helpers.
The function standardizes fields to common bibliographic tags (e.g., AU,
SO, CR, PY, DI) and returns a tidy tibble.
Value
A tibble with standardized bibliographic columns. Typical output includes:
id_short, AU, DI, CR, SO, DT, DE, AB, C1, TC, SC, SR,
PY, and DB (source flag: "openalex_csv" or "openalex_api"). See Details.
Details
CSV mode (format = "csv"):
If
fileis a URL, it is downloaded to a temporary file before parsing (a progress message is printed).Selected fields are mapped to standardized tags:
id_short(short OpenAlex ID),SR(=id_short),PY(=publication_year),TI(=title),DI(=doi),DT(=type),DE(=keywords.display_name),AB(=abstract),AU(=authorships.author.display_name),SO(=locations.source.display_name),C1(=authorships.countries),TC(=cited_by_count),SC(=primary_topic.field.display_name),CR(=referenced_works, with thehttps://openalex.org/prefix stripped), andDB = "openalex_csv".PYis coerced to numeric; a helper columnDI2(uppercase, punctuation-stripped variant ofDI) is added; columns with all-caps tags are placed first andDI2is relocated afterDI.
API mode (format = "api"):
filemust be a data frame containing at least columnid; typically this is returned byopenalexR::oa_request()+openalexR::oa2df()or similar.Records are filtered to
type %in% c("article","review")and deduplicated byid.The function derives:
id_short(=idwithout thehttps://openalex.org/prefix) andSR(=id_short);CR: concatenated short IDs fromreferenced_works(semicolon-separated);DE: concatenated keyword names (lower case) fromkeywords;AU: concatenated author names (upper case) fromauthorships;plus core fields
PY(=publication_year),TC(=cited_by_count),TI(=title),AB(=abstract),DI(=doi), andDB = "openalex_api".
The result keeps one row per
idand may include original columns from the input (via a right join), after constructing the standardized fields above.
Supported inputs
format = "csv"— a local path or an HTTP(S) URL to an OpenAlex CSV export.format = "api"— a data frame produced by{openalexR}for the works entity (with the usual OpenAlex columns, including list-columns such askeywords,authorships, andreferenced_works).
See also
OpenAlex R client: oa_request, oa2df.
Importers for Web of Science: read_wos.
Examples
if (FALSE) { # \dontrun{
## CSV export (local path)
x <- read_openalex("~/Downloads/openalex-works.csv", format = "csv")
## CSV export (URL)
x <- read_openalex("http://yoursite/openalex-works-2025-05-28T23-12-11.csv", format = "csv")
## Using the API with openalexR
# install.packages("openalexR")
library(openalexR)
url_api <- "https://api.openalex.org/works?page=1&filter=primary_location.source.id:s121026525"
df_api <- openalexR::oa_request(query_url = url_api) |>
openalexR::oa2df(entity = "works")
y <- read_openalex(df_api, format = "api")
} # }
