To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Abstractive summarization is an approach to document summarization that is not limited to selecting sentences from the document but can generate new sentences as well. We address the two main challenges in abstractive summarization: how to evaluate the performance of a summarization model and what is a good training objective. We first introduce new evaluation measures based on the semantic similarity of the input and corresponding summary. The similarity scores are obtained by the fine-tuned BERTurk model using either the cross-encoder or a bi-encoder architecture. The fine-tuning is done on the Turkish Natural Language Inference and Semantic Textual Similarity benchmark datasets. We show that these measures have better correlations with human evaluations compared to Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scores and BERTScore. We then introduce a deep reinforcement learning algorithm that uses the proposed semantic similarity measures as rewards, together with a mixed training objective, in order to generate more natural summaries in terms of human readability. We show that training with a mixed training objective function compared to only the maximum-likelihood objective improves similarity scores.
Privacy-preserving computing aims to protect the personal information of users while capitalizing on the possibilities unlocked by big data. This practical introduction for students, researchers, and industry practitioners is the first cohesive and systematic presentation of the field's advances over four decades. The book shows how to use privacy-preserving computing in real-world problems in data analytics and AI, and includes applications in statistics, database queries, and machine learning. The book begins by introducing cryptographic techniques such as secret sharing, homomorphic encryption, and oblivious transfer, and then broadens its focus to more widely applicable techniques such as differential privacy, trusted execution environment, and federated learning. The book ends with privacy-preserving computing in practice in areas like finance, online advertising, and healthcare, and finally offers a vision for the future of the field.
Morphological re-inflection generation is one of the most challenging tasks in the natural language processing (NLP) domain, especially with morphologically rich, low-resource languages like Arabic. In this research, we investigate the ability of transformer-based models in the singular-to-plural Arabic noun conversion task. We start with pretraining a Character-BERT model on a masked language modeling task using 1,134,950 Arabic words and then adopting the fusion technique to transfer the knowledge gained by the pretrained model to a full encoder–decoder transformer model, in one of the proposed settings. The second proposed setting directly fuses the output Character-BERT embeddings into the decoder. We then analyze and compare the performance of the two architectures and provide an interpretability section in which we track the features of attention with respect to the model. We perform the interpretation on both the macro and micro levels, providing some individual examples. Moreover, we provide a thorough error analysis showing the strengths and weaknesses of the proposed framework. To the best of our knowledge, this is the first effort in the Arabic NLP domain that adopts the development of an end-to-end fused-transformer deep learning model to address the problem of singular-to-plural conversion.
Machine translation technology is having increasing applications in health and medical settings that involve communications and interactions between people from diverse language, cultural background. Machine translation tools offer low-cost, and accessible solutions to help close the gap in cross-lingual health communications. The risks of machine translation need to be effectively controlled and properly managed to boost the confidence in this developing technology among health professionals. This study integrates the methodological benefits of machine learning in machine translation quality evaluation, and more importantly, the prediction of clinically relevant machine translation errors based on the study of linguistic features of the English source texts.
Access to health-related information is vital in our society. Official health websites provide essential and beneficial information to the general population. In particular, they can represent a crucial public service when this information is fundamental to fight against a new health threat –such as the COVID-19 pandemic. Yet, for these websites to achieve their ultimate informative goal, they need to ensure that their content is accessible to all users, especially to people with disabilities. Many of these websites –especially those from institutions operating in multilingual countries – offer their content in several languages, which, by definition, is an accessibility best practice. However, the level of accessibility achieved might not always be the same in all the language versions available. In fact, previous studies focusing on other types of multilingual websites have shown that localized versions are usually less accessible than the original ones. In this chapter, we present a research study that involved the examination of seventy-four official multilingual health sites to understand the current situation in terms of accessibility compliance. In particular, the home pages in two languages – English, original version, and Spanish, localized version – were checked against two specific success criteria (SC) from the Web Content Accessibility Guidelines (WCAG) current standard, using both automatic and manual evaluation methods. We observed that although overall accessibility scores were similar, the localized pages obtained worse results in the two SC analyzed more in depth – that is, language and title of the page. We contend that this finding could be explained by a lack of accessibility awareness or knowledge of those participating in the localization process. We thus advocate the existence of web professionals with an interdisciplinary background that could create multilingual accessible sites, providing an inclusive web experience for all.