Researcher Transforms OpenAI’s GPT-OSS-20B Model into a Less Aligned 'Base' Model with Greater Freedom

Researcher Transforms OpenAI’s GPT-OSS-20B Model into a Less Aligned ‘Base’ Model with Greater Freedom

Recently, OpenAI unveiled the powerful open weights AI large language model (LLM) family gpt-oss under the Apache 2.0 license. This marks their first open weights model release since 2019’s GPT-2. In a short time, developers have started to adapt it.

A notable case is by Jack Morris, a Cornell Tech PhD student and Meta researcher, who presented gpt-oss-20b-base. This version strips the “reasoning” traits to restore a base model state, promising quicker, unrestricted outputs.

Available on Hugging Face with an MIT License, it’s approved for further research and commercial use.

Understanding Morris’s changes requires differentiating between OpenAI’s release and what AI scholars term a “base model.”


AI Scaling Hits Its Limits

Power limits, soaring token costs, and inference delays are altering enterprise AI. Attend our salon to learn how elite teams are:

  • Using energy for strategic gains
  • Designing efficient inference for real throughput improvements
  • Achieving competitive ROI with sustainable AI systems

Register now to stay ahead: https://bit.ly/4mwGngO


Leading AI platforms such as OpenAI, Anthropic, Google, Meta, DeepSeek, and Alibaba’s Qwen use “post-trained” LLMs.

This involves being exposed to a phase with curated behavior examples.

For instruction models, it means presenting many instructions and ideal responses to foster helpful, polite, and safe answers.

OpenAI’s gpt-oss models, released August 5, were “reasoning-optimized” for consistent, safe instructions with structured “chain of thought” reasoning.

This approach began with OpenAI’s o1 model in September 2024 and is now widespread among AI labs—prompting models to think through multiple steps and verify their work before giving considered answers.

This enhances coding, math problem-solving, and factual queries with explanations but also steers responses from harmful or undesirable content.

A base model differs. It’s a raw LLM version before the application of reasoning-specific alignment. Base models predict ensuing text chunks without guardrails, stylistic preferences, or refusal behaviors.

Researchers value them for producing varied and unconstrained outputs, and for understanding models’ storage and patterning of their training data knowledge.

Morris sought to “reverse” OpenAI’s alignment and return gpt-oss-20B to its near-original pre-trained state.

“We basically reversed the alignment part of LLM training, so we have something that produces natural-looking text again,” he commented in an X thread announcing the project. “It doesn’t engage in CoT anymore. It is back to a model that just predicts the next token on generic text.”

Instead of using creative prompts to jailbreak the model, Morris adopted another approach after speaking with former OpenAI co-founder, ex-An

Leave a Reply

Your email address will not be published. Required fields are marked *