A recent study by Anthropic reveals that language models might acquire hidden traits during the distillation process, a common technique for tailoring models to specific…
View More “Anthropic Reveals How AI Fine-Tuning Can Covertly Instill Bad Habits”