Google launches RETVec to protect against malicious emails and spam

Text-to-density representation techniques vary widely, evolving from character bigrams to advanced subword vectorization to combat OOV challenges such as adversarial attacks and typos.

Contents

RETVec install

This strategy includes subword-level tokenization and decomposition of unknown words into N-grams for effective neural network training.

Researchers at Google recently developed announced A new resilient and efficient text vectorizer called RETVec protects Gmail users from malicious emails and spam.

document

Protect your storage with SafeGuard

StorageGuard scans, detects, and remediates security misconfigurations and vulnerabilities across hundreds of storage and backup devices.

RETVec

RETVec is an efficient, multilingual, next-generation text vectorizer with built-in adversarial resilience. This next-generation text vectorizer is robust to character-level operations such as:

insert
delete
typo
homoglyph
LEET substitution

The RETVec character encoder has two layers, which are described below.

integerization layer
Binalizer layer

RETVec Architecture (Source – Arxiv)

RETVec uses its own character encoder to handle UTF-8 efficiently. Easily supports over 100 languages without lookup tables or fixed vocabularies. Also, because it is a layer, it fits seamlessly into any TF model without any additional preprocessing.

RETVec Binarizer enhances word representation but lacks competitiveness. Researchers enhance it with smaller models to increase accuracy and outperform other models.

TensorFlow models easily employ RETVec for string vectorization in just one line. In addition to this, the raw strings were processed with built-in preprocessing.

Additionally, the system is fully functional for mobile and web use cases on devices as it supports:

Researchers tested RETVec against hostile content using Google spam filters. Replacing SentencePiece with RETVec improved spam detection by 38% with a 0.80% false positive rate and reduced latency by 30%.

This suggests that RETVec is competitive for real-world tasks, increasing confidence in its effectiveness.

How to optimize RETVec to improve multilingual skills, robustness, and smaller models within large language models (LLMs) is an important question. For small LLMs, the vocabulary layer can exceed 20% of the parameters, and RETVec removes it.

However, using RETVec in a generative model poses a challenge because its 256 floating-point embeddings do not translate directly to the softmax output. We need new training methods that are compatible with text generation.

Experimenting with character-by-character decoding and VQ-VAE models yields inconclusive results. In future work, we will address these limitations and explore the use of RETVec as a word embedding, replacing word2vec with GloVe and using its character encoder to train a text similarity model.

install

You can use ‘pip’ to install the latest TensorFlow version of RETVec.

In addition to this, RETVec has already been tested in TensorFlow 2.6 and later and Python 3.8 and later.

Experience how StorageGuard eliminates security blind spots in your storage systems. 14-day free trial.

Google launches RETVec to protect against malicious emails and spam

RETVec

install

Editor's Pick

Hot frog ‘sauna’ helps fight fungus deadly to Australian species

Bubba’s Hot Chicken, a 50-year-old recipe, expands to new locations

Katy Perry shows off hot dress with mega train featuring lyrics from new single: Pics

Top Writers

Opinion

You Might Also Like

Cyber threats and the increasing complexity of cybersecurity and IT infrastructure management

Balancing innovation with AI safety

Learn about Brex, Google Cloud, Aerospace, and more at Disrupt 2024

OODA Loop – Global police operation takes down 600 cybercrime servers linked to Cobalt attacks

RETVec

install

Editor's Pick

Hot frog ‘sauna’ helps fight fungus deadly to Australian species

Bubba’s Hot Chicken, a 50-year-old recipe, expands to new locations

Katy Perry shows off hot dress with mega train featuring lyrics from new single: Pics

Top Writers

Opinion

You Might Also Like

Cyber ​​threats and the increasing complexity of cybersecurity and IT infrastructure management

Balancing innovation with AI safety

Learn about Brex, Google Cloud, Aerospace, and more at Disrupt 2024

OODA Loop – Global police operation takes down 600 cybercrime servers linked to Cobalt attacks

Cyber threats and the increasing complexity of cybersecurity and IT infrastructure management