A new kind of large language model, developed by researchers at the Allen Institute for AI (Ai2), makes it possible to control how training data is used even after a model has been built.
Researchers find large language models process diverse types of data, like different languages, audio inputs, images, etc., similarly to how humans reason about complex problems. Like humans, LLMs ...
Personally identifiable information has been found in DataComp CommonPool, one of the largest open-source data sets used to train image generation models. Millions of images of passports, credit cards ...