The getLattes R package, written by Roney Fraga Souza, Winicius Sabino and Luis Felipe de Souza Rodrigues, was built to extract data from the Lattes curriculum platform exported as XML.

To automate the download process, please see Captchas Negated by Python reQuests - CNPQ.

getLattesWeb

Non-programmers alternative, use the getLattesWeb graphical interface:

Programmers

Installation

Stable version from CRAN.

Development version from GitHub.

# install and load devtools from CRAN
# install.packages("devtools")
library(devtools)

# install and load getLattes
devtools::install_github("roneyfraga/getLattes")
library(getLattes)

Import XML file

The Lattes XML file can be compressed inside a .zip.

# find the file in system
zip_xml <- system.file('extdata/4984859173592703.zip', package = 'getLattes')

curriculo <- xml2::read_xml(zip_xml)

Extract data

# to extract data from one curriculum 
getDadosGerais(curriculo)
getArtigosAceitos(curriculo)
getArtigosPublicados(curriculo)
getAreasAtuacao(curriculo)
getArtigosPublicados(curriculo)
getAtuacoesProfissionais(curriculo)
getBancasDoutorado(curriculo)
getBancasGraduacao(curriculo)
getBancasMestrado(curriculo)
getCapitulosLivros(curriculo)
getDadosGerais(curriculo)
getEnderecoProfissional(curriculo)
getEventosCongressos(curriculo)
getFormacaoDoutorado(curriculo)
getFormacaoMestrado(curriculo)
getFormacaoGraduacao(curriculo)
getIdiomas(curriculo)
getLinhaPesquisa(curriculo)
getLivrosPublicados(curriculo)
getOrganizacaoEventos(curriculo)
getOrientacoesDoutorado(curriculo)
getOrientacoesMestrado(curriculo)
getOrientacoesPosDoutorado(curriculo)
getOutrasProducoesTecnicas(curriculo)
getParticipacaoProjeto(curriculo)
getProducaoTecnica(curriculo)
getProducaoTecnica(curriculo)
getTrabalhosEmEventos()
getId(curriculo)