Alternative links for a list of my publications: Google Scholar.

2021

  • Iknoor Singh, Carolina Scarton, Kalina Bontcheva (2021): Multistage BiCross encoder for multilingual access to COVID-19 health information. PloS one 16 (9), e0256874. [LINK]
  • Fernando Alva-Manchego, Carolina Scarton, Lucia Specia (2021): The (Un) Suitability of Automatic Evaluation Metrics for Text Simplification. Computational Linguistics, pp. 1-29. [LINK]
  • Marcos García, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio (2021). Assessing the Representations of Idiomaticity in Vector Models with a Noun Compound Dataset Labeled at Type and Token Levels. Proceedings of the 59th ACL and the 11th IJCNLP, virtual conference, pp. 2730–2741. [PDF]
  • Marcos García, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio (2021). Probing for idiomaticity in vector space models. Proceedings of the 16th EACL, virtual conference, pp. 3551–3564. [PDF]

2020

  • Carolina Scarton, Diego F Silva, Kalina Bontcheva (2020). Measuring What Counts: The case of Rumour Stance Classification. Proceedings of the 1st AACL and 10th IJCNLP, virtual conference, pp. 925–932. [PDF]
  • João A Leite, Diego F Silva, Kalina Bontcheva, Carolina Scarton (2020). Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis. Proceedings of the 1st AACL and 10th IJCNLP, virtual conference, pp. 914–924. [PDF]
  • Yue Li, Carolina Scarton (2020). Revisiting Rumour Stance Classification: Dealing with Imbalanced Data. Proceedings of the 3rd RDSM, virtual workshop, pp. 38–44. [PDF]
  • Carolina Scarton, Pranava Swaroop Madhyastha, Lucia Specia (2020). Deciding When, How and for Whom to Simplify. Proceedings of ECAI 2020, virtual conference. [PDF]
  • Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Benoît Sagot, Lucia Specia (2020). ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations. Proceedings of ACL 2020, virtual conference, pp. 4668-4679. [PDF]
  • Fernando Alva-Manchego, Carolina Scarton, Lucia Specia (2020). Data-driven sentence simplification: Survey and benchmark. Computational Linguistics, 46(1):135-187, MIT Press. [PDF]
  • Roney Santos, Gabriela Pedro, Sidney Leal, Oto Vale, Thiago Pardo, Kalina Bontcheva, Carolina Scarton (2020). Measuring the Impact of Readability Features in Fake News Detection. Proceedings of LREC 2020, pp. 1404-1413. [PDF]
  • Gabriela Wick-Pedro, Roney LS Santos, Oto A Vale, Thiago AS Pardo, Kalina Bontcheva, Carolina Scarton (2020). Linguistic Analysis Model for Monitoring User Reaction on Satirical News for Brazilian Portuguese. Proceedings of PROPOR 2020, Évora, Portugal, pp. 313-320. [LINK]

2019

  • Carolina Scarton (n.d.). Horacio Saggion, Automatic Text Simplification. Synthesis lectures on human language technologies, April 2017. 137 pages, ISBN:16270586809781627058681. Natural Language Engineering, 1-4. doi:10.1017/S1351324919000603
  • Fernando Alva-Manchego, Louis Martin, Carolina Scarton and Lucia Specia (2019): EASSE: Easier Automatic Sentence Simplification Evaluation. Proceedings of EMNLP 2019 (demonstration systems), Hong Kong, China, pp. 49-54. [PDF]
  • Carolina Scarton, Mikel L. Forcada, Miquel Esplà-Gomis and Lucia Specia (2019): Estimating post-editing effort: a study on human judgements, task-based and reference-based metrics. Proceedings of IWSLT 2019, Hong Kong, China. [PDF]
  • Fernando Alva-Manchego, Carolina Scarton and Lucia Specia (2019): Cross-Sentence Transformations in Text Simplification. In the Proceedings of the 2019 Workshop on Widening NLP, Florence, Italy, pp. 181-184. [PDF]

2018

  • Lucia Specia, Carolina Scarton and Gustavo Henrique Paetzold (2018): Quality Estimation for Machine Translation. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers. [LINK]
  • Mikel Forcada, Carolina Scarton, Lucia Specia, Barry Haddow and Alexandra Birch (2018): Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting. In the Proceedings of WMT 2018, Brussels, Belgium. [PDF] [BIBTEX]
  • Chiraag Lala, Pranava Swaroop Madhyastha, Carolina Scarton and Lucia Specia (2018): Sheffield’s Submissions for WMT18 Multimodal Translation Tasks. In the Proceedings of WMT 2018, Brussels, Belgium. [PDF] [BIBTEX]
  • Julia Ive, Carolina Scarton, Frederic Blain and Lucia Specia (2018): Sheffield’s systems for the WMT18 Quality Estimation shared task. In the Proceedings of WMT 2018, Brussels, Belgium. [PDF] [BIBTEX]
  • Carolina Scarton and Lucia Specia (2018): Learning Simplifications for Specific Target Audiences. In the Proceedings of ACL 2018, Melbourne, Australia, pp. 712-718. [PDF] [BIBTEX]
  • Carolina Scarton, Gustavo Henrique Paetzold and Lucia Specia (2018): Text Simplification from Professionally Produced Corpora. In the Proceedings of LREC 2018, Miyazaki, Japan, pp. 3504-3510. [PDF] [BIBTEX]
  • Carolina Scarton, Gustavo Henrique Paetzold and Lucia Specia (2018): SimPA: A Sentence-Level Simplification Corpus for the Public Administration Domain. In the Proceedings of LREC 2018, Miyazaki, Japan, pp.4333-4338. [PDF] [BIBTEX]

2017

  • Fernando Alva Manchego, Joachim Bingel, Gustavo Henrique Paetzold, Carolina Scarton and Lucia Specia (2017): Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs. In the Proceedings of the 8th International Joint Conference on Natural Language Processing, Taipei, Taiwan, pp. 295-305. [PDF] [BIBTEX]
  • Carolina Scarton, Alessio Palmero Aprosio, Sara Tonelli, Tamara Martín Wanton and Lucia Specia (2017): MUSST: A Multilingual Syntactic Simplification Tool. In the Proceedings of the 8th International Joint Conference on Natural Language Processing: System Demonstrations, Taipei, Taiwan, pp. 25-28. [PDF] [BIBTEX]
  • Frédéric Blain, Carolina Scarton and Lucia Specia (2017): Bilexical Embeddings for Quality Estimation. In the Proceedings of the Second Conference on Machine Translation, Copenhagen, Denmark, pp. 545-550. [PDF] [BIBTEX]
  • Yvette Graham, Qingsong Ma, Timothy Baldwin, Qun Liu, Carla Parra and Carolina Scarton (2017): Improving Evaluation of Document-level Machine Translation Quality Estimation. In the Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 356-361. [PDF] [BIBTEX]
  • Carolina Scarton (2017): Document-Level Machine Translation Quality Estimation. PhD Thesis (University of Sheffield, UK). [PDF] [BIBTEX]

2016

  • Carolina Scarton, Gustavo Henrique Paetzold and Lucia Specia (2016): Quality Estimation for Language Output Applications. In the Proceedings of the 26th International Conference on Computational Linguistics: Tutorial Abstracts, Osaka, Japan, pp. 14-17. [PDF] [BIBTEX]
  • Carolina Scarton, Daniel Beck, Kashif Shah, Karin Sim Smith and Lucia Specia (2016): Word embeddings and discourse information for Machine Translation Quality Estimation. In the Proceedings of the First Conference on Machine Translation, Berlin, Germany, pp. 831-837. [PDF] [BIBTEX]
  • Odrej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor and Marcos Zampieri (2016): Findings of the 2016 Conference on Machine Translation. In the Proceedings of the First Conference on Machine Translation, Berlin, Germany, pp. 131-198. [PDF] [BIBTEX]
  • Carolina Scarton and Lucia Specia (2016): A Reading Comprehension Corpus for Machine Translation Evaluation. In the Proceedings of the Tenth International Conference on Language Resources and Evaluation, Portorož, Slovenia, pp. 3652-3658. [LINK] [BIBTEX]
  • Liling Tan, Carolina Scarton, Lucia Specia and Josef van Genabith (2016): SAARSHEFF at SemEval-2016 Task 1: Semantic Textual Similarity with Machine Translation Evaluation Metrics and (eXtreme) Boosted Tree Ensembles. In the Proceedings of the Tenth International Workshop on Semantic Evaluation (SemEval 2016), San Diego, CA, pp. 640-645. [PDF] [BIBTEX]
  • Sandra Maria Aluísio, Andre Cunha and Carolina Scarton (2016): Evaluating Progression of Alzheimer’s Disease by Regression and Classification Methods in a Narrative Language Test in Portuguese. In the Proceedings of the International Conference on Computational Processing of the Portuguese Language, Tomar, Portugal, pp. 109-114. [LINK]

2015

  • Carolina Scarton and Lucia Specia (2015): A quantitative analysis of discourse phenomena in machine translation. Discours - Revue de linguistique, psycholinguistique et informatique, number 16. [LINK] [BIBTEX]
  • Odrej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia and Marco Turchi (2015): Findings of the 2015 Workshop on Statistical Machine Translation. In the Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, pp. 1-46. [PDF] [BIBTEX]
  • Carolina Scarton, Liling Tan and Lucia Specia (2015): USHEF and USAAR-USHEF participation in the WMT15 QE shared task. In the Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, pp. 336-341. [PDF] [BIBTEX]
  • Lucia Specia, Gustavo Henrique Paetzold and Carolina Scarton (2015): Multi-level Translation Quality Prediction with QuEst++. In the Proceedings of ACL-IJCNLP 2015 System Demonstrations, Beijing, China, pp. 110-120. [PDF] [BIBTEX]
  • Carolina Scarton (2015): Discourse and Document-level Information for Evaluating Language Output Tasks. In the Proceedings of NAACL-HLT 2015 Student Research Workshop (SRW), Denver, CO, pp. 118-125. [PDF] [BIBTEX]
  • Liling Tan, Carolina Scarton, Lucia Specia, Josef van Genabith (2015): USAAR-SHEFFIELD: Semantic Textual Similarity with Deep Regression and Machine Translation Evaluation Metrics. In the Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, CO, pp. 85-89. [PDF] [BIBTEX]
  • Carolina Scarton, Marcos Zampieri, Mihaela Vela, Josef van Genabith and Lucia Specia (2015): Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation. In the Proceedings of the 18th Annual Conference of the European Association for Machine Translation (EAMT 2015), Antalya, Turkey, pp. 121-128. [PDF] [BIBTEX]

2014

  • Carolina Scarton, Magali Sanches Duran and Sandra Maria Aluísio (2014): Using Cross-linguistic Knowledge to Build VerbNet-style Lexicons: Results for a (Brazilian) Portuguese VerbNet. In the Proceedings of the 2014 International Conference on Computational Processing of Portuguese, São Carlos-SP, Brazil, pp. 149-160. [LINK] [BIBTEX]
  • Carolina Scarton and Lucia Specia (2014b): Exploring Consensus in Machine Translation for Quality Estimation. In the Proceedings of the Ninth Workshop on Statistical Machine Translation (WMT 2014) - in conjunction with ACL 2014, Baltimore-MD, pp. 342-347. [PDF] [BIBTEX]
  • Carolina Scarton and Lucia Specia (2014a): Document-level translation quality estimation: exploring discourse and pseudo-references. In the Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT 2014), Dubrovnik, Croatia, pp. 101-108. [PDF] [BIBTEX]
  • Carolina Scarton, Lin Sun, Karin Kipper-Schuler, Magali Sanches Duran, Martha Palmer and Anna Korhonen (2014): Verb Clustering for Brazilian Portuguese. In the Proceedings of 15th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2014), Katmandu, Nepal, pp. 25-39. [LINK] [BIBTEX]
  • Cíntia M. Toledo, Andre Cunha, Carolina Scarton, Sandra Aluísio (2014): Automatic classification of written descriptions by healthy adults: an overview of the application of natural language processing and machine learning techniques to clinical discourse analysis. Dement. Neuropsychol. 2014;8(3):227-235. [LINK] [BIBTEX]
  • Leonardo Zilio, Adriano Zanette and Carolina Scarton (2014): Automatic Extraction of Subcategorization Frames from Portuguese Corpora. In Aluisio, S. M. and Tagnin. S. E. O. (eds.) New Languages Technologies and Linguistic Research: a Two-Way Road. Cambridge Scholars Publishing, pp. 78-96. [LINK] [BIBTEX]

2013

  • Carolina Scarton (2013): VerbNet. Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil. Master’s Dissertation (University of São Paulo, Brazil). [PDF] [BIBTEX]
  • Magali Sanches Duran, Carolina Scarton, Sandra Maria Aluísio and Carlos Ramisch (2013): Identifying Pronominal Verbs: Towards Automatic Disambiguation of the Clitic ‘se’ in Portuguese. In the Proceedings of 9th Workshop on Multiword Expressions (MWE 2013), in conjunction with NAACL-HLT-2013, Atlanta, Georgia, USA. [PDF] [BIBTEX]
  • André Cunha, Cíntia Toledo, Carolina Scarton, Letícia Mansur and Sandra Maria Aluísio (2013): Classificação Automática de Discurso Descritivo Escrito de Adultos Sadios: Referência para a Avaliação da Linguagem de Lesados Cerebrais. In the Proceedings of X Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2013), Fortaleza-CE, Brazil. [PDF] [BIBTEX]

2012

  • Carolina Scarton and Sandra Maria Aluísio (2012): Towards a cross-linguistic VerbNet-style lexicon to Brazilian Portuguese. In Proceedings of LREC 2012 Workshop on Creating Cross-language Resources for Disconnected Languages and Styles (CREDISLAS 2012), Istambul, Turkey. [PDF] [BIBTEX]
  • Leonardo Zilio, Adriano Zanette and Carolina Scarton (2012): Extração Automática de Estruturas de Subcategorização a partir de Corpora em Português. In the Proceedings of XI Encontro de Linguística de Corpus (ELC 2012), São Carlos - SP, Brazil. [PDF] [BIBTEX]
  • Adriano Zanette, Carolina Scarton and Leonardo Zilio (2012): Automatic extraction of subcategorization frames from corpora: an approach to Portuguese. In International Conference on Computational Processing of Portuguese (PROPOR 2012): Demonstration session, Coimbra, Portugal. [PDF] [BIBTEX]

2011

  • Carolina Scarton (2011): VerbNet.Br: construção semiautomática de um léxico computacional de verbos para o português do Brasil. In the Proceedins of 8th Brazilian Symposium in Information and Human Language Technology (STIL 2011), Cuiabá-MT, Brazil. [PDF] [BIBTEX]
  • Bianca Pasqualini, Carolina Scarton and Maria José B. Finatto (2011): Comparando Avaliações de Inteligibilidade Textual entre Originais e Traduções de Textos Literários. In the Proceedings of 8th Brazilian Symposium in Information and Human Language Technology (STIL 2011), Cuiabá-MT, Brazil. [PDF] [BIBTEX]
  • Maria José B. Finatto, Carolina Scarton, Amanda Rocha and Sandra Maria Aluísio (2011): Características do jornalismo popular: avaliação da inteligibilidade e auxílio à descrição do gênero. In the Proceedings of 8th Brazilian Symposium in Information and Human Language Technology (STIL 2011), Cuiabá-MT, Brazil. [PDF] [BIBTEX]
  • Carolina Scarton and Sandra Maria Aluísio (2011): O uso do MERLOT por Alunos de Teoria da Computação para a Criação de Materiais de Ensino-Aprendizagem. In the Proceedings of XIX Workshop sobre Educação em Computação (WEI 2011), Natal-RN, Brazil. [PDF] [BIBTEX]
  • Fernando A. M. Muniz, Willian M. Watanabe, Carolina Scarton and Sandra Maria Aluísio (2011): Extração de Termos de Manuais Técnicos de Produtos Tecnológicos: uma Aplicação em Sistemas de Adaptação Textual. In the Proceedings of XXXVIII Seminário Integrado de Software e Hardware (SEMISH 2011) Natal-RN, Brazil. [PDF] [BIBTEX]
  • Carolina Scarton and Sandra Maria Aluísio (2011): VerbNet.Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil. In the Proceedings of X Encontro de Linguística de Corpus (ELC 2011), on-going research, Belo Horizonte-MG, Brazil [PDF] [BIBTEX]
  • Carolina Scarton (2011): VerbNet-Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil. In the Proceedings of I Congresso Internacional de Estudos do Léxico (ICIEL 2011), Comunicação Coordenada: Rove Chishman, Magali Sanches Duran, Carolina Scarton and Oto Araújo Vale - O verbo no Computador: diferentes abordagens da descrição lexical para o processamento de língua natural, Salvador-BA, Brazil. [LINK] [BIBTEX]

2010

  • Carolina Scarton and Sandra Maria Aluísio (2010): Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. Linguamática, v. 2, p. 45-62. [PDF] [BIBTEX]
  • Sandra Maria Aluísio, Lucia Specia, Caroline Gasperin and Carolina Scarton (2010): Readability Assessment for Text Simplification. In the Proceedings of 5th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2010), Los Angeles, CA, USA. [PDF] [BIBTEX]
  • Carolina Scarton, Caroline Gasperin and Sandra Maria Aluísio (2010): Revisiting the Readability Assessment of Texts in Portuguese. In the Proceedings of 12th Ibero-American Conference on Artificial Intelligence (Iberamia 2010), Bahia Blanca, Argentina, pp. 306-315. [PDF] [BIBTEX]
  • Carolina Scarton, Matheus Oliveira, Arnaldo Candido Junior, Caroline Gasperin and Sandra Maria Aluísio (2010): SIMPLIFICA: an authoring system targeting simplified texts in Brazilian Portuguese. In International Conference on Computational Processing of Portuguese (PROPOR 2010): Demonstration session, Porto Alegre-RS, Brazil. [PDF] [BIBTEX]
  • Carolina Scarton and Sandra Maria Aluísio (2010): Coh-Metrix-Port: a readability assessment tool for texts in Brazilian Portuguese. In International Conference on Computational Processing of Portuguese (PROPOR 2010): Demonstration session, Porto Alegre-RS, Brazil. [PDF] [BIBTEX]
  • Carolina Scarton, Matheus Oliveira, Arnaldo Candido Junior, Caroline Gasperin and Sandra Maria Aluísio (2010): SIMPLIFICA: a tool for authoring simplified texts in Brazilian Portuguese guided by readability assessments. In NAACL 2010: demonstration session, Los Angeles, CA, USA. [PDF] [BIBTEX]

2009

  • Carolina Scarton, Daniel M. Almeida and Sandra Maria Aluísio (2009): Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. In Proceedings of 7th Brazilian Symposium in Information and Human Language Technology (STIL 2009), São Carlos-SP, Brazil. [PDF] [BIBTEX]