Today, we are excited to release LiteCoder-Terminal-Preview, a series of models specialized in terminal-based interactions. This release is part of our recent efforts to develop capable small and medium-sized code agent models.

Notably, LiteCoder achieves competitive results using fewer than 1,000 training samples. By relying entirely on a fully synthetic pipeline—without converting any existing datasets—we were able to secure significant gains on the challenging Terminal Bench, matching the performance of leading open-source models in the same weight class with extreme data efficiency.

Released Artifacts

2025/12/17
LiteCoder-4b-Terminal-preview Model https://huggingface.co/Lite-Coder/LiteCoder-4b-Terminal-preview
LiteCoder-SFT-Terminal-preview Dataset https://huggingface.co/datasets/Lite-Coder/LiteCoder-SFT-Terminal-preview
icip-cas/LiteCoder Code

Data Construction Pipeline

To build a robust terminal agent model, we developed a rigorous data synthesis pipeline consisting of three stages: Task Curation, Environment Preparation, and Trajectory Generation.

Task Sampling

image.png

In the first version of our data, we established a taxonomy covering seven core domains of terminal usage: ai_ml, build_tools, data_science, networking, security, system_admin, and version_control.

Based on the taxonomy, We adapt MAGPIE-like method to synthesize long-horizon agentic tasks. By feeding the model a domain-specific system message followed by the standard chat template prefix for a user turn (e.g., <|user|>), the model "autocompletes" the sequence, generating a plausible and high-quality task tailored to the specified domain.

image.png

Feasibility Check

To ensure data integrity, we employ an LLM-as-a-Judge to validate raw tasks. This stage evaluates entries against criteria—including complexity balance, clarity of specification, and resource availability—filtering out unfeasible or ambiguous tasks to maintain a high-quality task set.

image.png

Environment Preparation

Many terminal tasks (e.g., fixing a bug in an existing repo or managing git conflicts) rely on specific starting states. To address this, we utilize an agent to interactively generate the necessary starting artifacts within a Docker container. Once setup is complete, we extract the final state to serve as the initial environment for the actual task execution.

image.png

Trajectory Generation