squeez

End-to-end token optimizer for Claude Code, GitHub Copilot CLI, OpenCode, Gemini CLI, and OpenAI Codex CLI. Compresses bash output up to 95%, collapses redundant calls, and injects a terse prompt persona — automatically, with zero new runtime dependencies.

Install

Three methods — all produce the same result (binary at ~/.claude/squeez/bin/squeez, hooks registered).

curl (recommended)

curl -fsSL https://raw.githubusercontent.com/claudioemmanuel/squeez/main/install.sh | sh

Windows: requires Git Bash. Run the command above inside Git Bash — PowerShell/CMD are not supported.

npm / npx

# Install globally
npm install -g squeez

# Or run once without installing
npx squeez

Downloads the correct pre-built binary for your platform (macOS universal, Linux x86_64/aarch64, Windows x86_64). Requires Node ≥ 16.

cargo (build from source)

cargo install squeez

Builds from crates.io. Requires Rust stable. On Windows you also need MSVC C++ Build Tools.

Supported hosts

squeez setup auto-detects every CLI present on disk and registers the hooks. squeez uninstall removes them. Session data and config.ini are preserved so reinstall is lossless.

Host	Memory file	Bash wrap	Session memory	Budget inject (Read/Grep)	Notes
Claude Code	`~/.claude/CLAUDE.md`	✅ native	✅ native	✅ native	Restart Claude Code to pick up hooks
Copilot CLI	`~/.copilot/copilot-instructions.md`	✅ native	✅ native	✅ native	Restart Copilot CLI after setup
OpenCode	`~/.config/opencode/AGENTS.md`	✅ native	✅ native	✅ native	Plugin at `~/.config/opencode/plugins/squeez.js`; MCP tool calls skip hooks (upstream sst/opencode#2319)
Gemini CLI	`~/.gemini/GEMINI.md`	✅ native	✅ native	🟡 soft via `GEMINI.md`	`BeforeTool` rewrite schema pending upstream docs (google-gemini/gemini-cli#25629)
Codex CLI	`~/.codex/AGENTS.md`	✅ native	✅ native	🟡 soft via `AGENTS.md`	`apply_patch` hooks landed in 0.123.0 (#18391); `updatedInput` + `read_file`/`grep` hook surface still pending (openai/codex#18491)
Pi	`~/.pi/agent/skills/squeez/SKILL.md`	✅ native	✅ via skill	✅ native	TypeScript extension at `~/.pi/agent/extensions/squeez/index.ts`; restart Pi after setup

Manage

squeez setup                  # register into every detected host
squeez setup --host=<slug>    # register into one host
squeez uninstall              # remove squeez entries from every detected host
squeez uninstall --host=<slug>

Slugs: claude-code / copilot / opencode / gemini / codex / pi.

After install, restart the CLI you use to pick up the new hooks.

Uninstall

squeez uninstall              # preserves session data + config.ini
bash ~/.claude/squeez/uninstall.sh   # (legacy) full wipe, if the script exists

Self-update

squeez update             # download latest binary + verify SHA256
squeez update --check     # check for update without installing
squeez update --insecure  # skip checksum (not recommended)

What it does

Feature	Description
Bash compression	Intercepts every command via `PreToolUse` hook, applies smart filter → dedup → grouping → truncation. Up to 95% reduction.
Context engine	Cross-call redundancy with two paths: exact-hash match (FNV-1a, fast) and fuzzy trigram-shingle Jaccard ≥0.85 (whitespace, timestamps, single-line edits no longer defeat dedup).
Summarize fallback	Outputs exceeding 500 lines are replaced with a ≤40-line dense summary (top errors, files, test result, tail). Benign outputs get 2× the threshold so successful builds stay verbatim.
Adaptive intensity	Truly adaptive: Full (×0.6 limits) below 80% of token budget, Ultra (×0.3) above. Used to be always-Ultra; now actually responds to session pressure.
MCP server	`squeez mcp` runs a JSON-RPC 2.0 server over stdio exposing 13 read-only tools so any MCP-compatible LLM can query session memory directly. Hand-rolled, no `mcp.server` dependency.
Auto-teach payload	`squeez protocol` (or the `squeez_protocol` MCP tool) prints a 2.4 KB self-describing payload — the LLM learns squeez's markers and protocol on first call.
Caveman persona	Injects an ultra-terse prompt at session start so the model responds with fewer tokens.
Memory-file compression	`squeez compress-md` compresses CLAUDE.md / AGENTS.md / copilot-instructions.md in-place — pure Rust, zero LLM. i18n-aware: set `lang = pt` (or `--lang pt`) for pt-BR article/filler/phrase dropping and Unicode-correct matching.
Session memory	On `SessionStart`, injects a structured summary of the previous session: files investigated, learned facts (errors + git events), completed work (builds, test passes), and next steps (unresolved errors, failing tests). Summaries carry temporal validity (`valid_from`/`valid_to`).
Token tracking	Every `PostToolUse` result (Bash, Read, Grep, Glob, Monitor, SubagentStop) feeds a `SessionContext` so squeez knows what the agent has already seen. Read/Grep/Glob/Monitor outputs are also rewritten via `updatedToolOutput` (Claude Code v2.1.119+) when content is redundant or oversized.
Token economy	Sub-agent cost tracking (~200K tokens/spawn), burn rate prediction (`[budget: ~N calls left]`), session efficiency scoring, tool result size budgets.
Auto-calibration	`squeez calibrate` runs benchmarks on install and generates an optimized `config.ini` (aggressive / balanced / conservative profiles).

How squeez compares

There are now several token-reduction tools targeting AI coding CLIs. They make different bets — the right one depends on what you care about: zero deps, lossless filtering, structural reformatting, or task-conditioned ML.

Tool	Approach	Hosts	Deps	Key wins	Trade-off
squeez (this project)	Hook + 4-stage filter pipeline + context engine (MinHash dedup, summarize, adaptive intensity) + MCP server	Claude Code, Copilot CLI, OpenCode, Gemini CLI, Codex CLI	Zero runtime deps (`libc` only on Unix)	Up to 95% on bash; cross-call dedup; signature-mode for source files; TOON re-encoder for JSON outputs; 14 MCP tools; enterprise (Bedrock/Vertex) USD-saved estimate	Heuristic, not ML — no per-task understanding
rtk-ai/rtk	Hook proxy that rewrites bash commands (`git status` → `rtk git status`), then compresses 100+ command outputs	Claude Code, Cursor	Zero deps (Rust)	60-90% on 100+ commands; `rtk read -l aggressive` for signature mode	rtk#582: aggressive rewriting can increase total cost by 18% because Claude emits +50% more output tokens to compensate for stripped context. squeez ships a guard against this regime.
KRLabsOrg/squeez	Task-conditioned ML (Qwen 2B / ModernBERT 150M) — pipe tool output + task description, get back only relevant lines	Any (CLI tool)	Python, PyTorch / vLLM server	92% compression, F1 0.80; task-aware (same log slices kept differently per query)	Requires running an LLM locally; not zero-dep. Same project name, different design.
ojuschugh1/sqz	CLI context compressor	Any	Python	Single-command compression	Lower coverage than the others.
LLMLingua-2 (Microsoft)	Neural prompt compressor that removes 50-80% of a prompt while preserving meaning	API / library	Python, transformers	Strong on long static prompts	Latency + model dep; not a CLI hook.
TOON	Schema-aware JSON replacement (`users[100]{id,name,role}:`) — ~40% fewer tokens on arrays of uniform objects	Library, not a CLI	TypeScript SDK	Lossless on the right shape; squeez embeds a TOON encoder for `gh`/`kubectl`/`aws`/`gcloud`/`az` JSON outputs	Only helps on uniform JSON shapes.

If you want a CLI hook that just works, never needs a Python runtime, and never silently inflates your output tokens, squeez is the safe default. If you can run an LLM next to your shell and want task-aware filtering, KRLabsOrg/squeez is worth a look as a complement. The two squeez projects share a name but are independent.

Scope & Limits

squeez optimizes what it can reach — the surfaces exposed by each host's hook API. It cannot fix token leaks outside those surfaces.

Coverage table

Surface	How	When	Supported hosts
Bash stdout/stderr	`PreToolUse` wraps command w/ 4-stage pipeline (smart-filter → dedup → grouping → truncation)	Every Bash invocation	all 5
Read / Grep / Glob limits	`PreToolUse` injects `limit` / `head_limit` per `read_max_lines` / `grep_max_results`	Every Read/Grep/Glob call	Claude Code, Copilot, OpenCode (hard); Gemini + Codex soft via GEMINI.md / AGENTS.md
Read / Grep / Glob / Monitor output rewrite	`PostToolUse` runs `squeez compress-output` and returns `updatedToolOutput` when content is redundant or oversized	Claude Code v2.1.119+	Claude Code
Agent / Task prompt	`PreToolUse` compresses `tool_input.prompt` (markdown-aware, via `compress-prompt`)	When prompt > `agent_prompt_max_tokens`	Claude Code (post–v1.8.0)
Sub-agent output	`SubagentStop` hook feeds `last_assistant_message` into SessionContext for cross-call dedup	On every sub-agent completion	Claude Code
Compaction lifecycle	`PreCompact` logs the event; `PostCompact` emits a re-arm reminder so the model knows compression is still active	On context compaction	Claude Code
Session memory	`SessionStart` injects prior session summary + file-access cache	Once per session start	all 5
Markdown viewing	Bash handler routes `.md` reads through `compress-md` when `auto_compress_md=true`	Viewer commands on .md paths	all 5

What squeez CANNOT compress

Agent/Task returned output. No hook API surface exists to rewrite an Agent's return value. PostToolUse updatedToolOutput (Claude Code v2.1.119+) covers built-in tools (Read, Grep, Glob, Monitor) but not the Agent/Task result. Workaround: keep agent prompts compact (squeez compresses at dispatch time via PreToolUse), and use squeez_agent_costs MCP tool to monitor spawn overhead.

Skills & slash-command files. Claude Code loads these into the system prompt before any hook fires. squeez has no visibility into session-start system prompt construction.

User's top-level prompt. squeez runs per tool call, not on user turns.

Tools whose host doesn't expose PreToolUse / BeforeTool. E.g. Codex apply_patch hooks landed in 0.123.0, but updatedInput is explicitly unsupported and read_file/grep still have no hook surface (openai/codex#18491) — so Read/Grep caps for Codex are soft hints in AGENTS.md, not hard injections.

Secondary wins (not compression, but token-saving)

Cross-call redundancy dedup — exact-hash and fuzzy-trigram collapsing across 16 recent calls (see Context engine)
File-access cache — subsequent Bash commands trimmed when re-reading a file squeez has already fingerprinted
Burn-rate warnings — [budget: ~N calls left] nudges so the user changes behavior before context pressure spikes

Reducing overall session cost

squeez cannot automate these, but you can:

Fewer Agent/Task dispatches per session → use squeez_agent_costs to track, then refactor tasks to batch work
Smaller prompts injected into agents → squeez compresses them at dispatch, but smaller is better
Shorter CLAUDE.md / AGENTS.md files → run squeez compress-md --ultra to drop abbreviations and filler

Benchmarks

Measured on macOS (Apple Silicon). Token count = chars / 4 (matches Claude's ~4 chars/token). Run squeez benchmark to reproduce.

Per-scenario results — 28 scenarios × 5 iterations

Scenario	Before	After	Reduction	Latency
`summarize_huge`	82,257 tk	420 tk	-99%	56.0 ms
`repetitive_output`	4,692 tk	37 tk	-99%	215 µs
`xcode_build`	1,881 tk	17 tk	-99%	64 µs
`high_context_adaptive`	4,418 tk	52 tk	-99%	846 µs
`agent_directory_output`	3,348 tk	167 tk	-95%	360 µs
`ps_aux`	40,373 tk	2,338 tk	-94%	4.6 ms
`git_log_200`	2,692 tk	275 tk	-90%	208 µs
`tsc_errors`	731 tk	101 tk	-86%	23 µs
`cargo_build_noisy`	2,106 tk	439 tk	-79%	286 µs
`docker_logs`	665 tk	186 tk	-72%	48 µs
`curl_html_response`	2,181 tk	626 tk	-71%	47 µs
`find_deep`	424 tk	134 tk	-68%	76 µs
`git_status`	50 tk	16 tk	-68%	16 µs
`pytest_failures`	3,402 tk	1,175 tk	-65%	328 µs
`verbose_app_log`	4,957 tk	1,978 tk	-60%	320 µs
`npm_install`	524 tk	218 tk	-58%	48 µs
`crosscall_redundancy_3x`	486 tk	237 tk	-51%	51.8 ms
`ls_la`	1,782 tk	872 tk	-51%	222 µs
`env_dump`	441 tk	287 tk	-35%	27 µs
`git_copilot`	640 tk	421 tk	-34%	109 µs
`agent_heavy`	2,306 tk	1,551 tk	-33%	345 µs
`md_prose`	187 tk	138 tk	-26%	1.0 ms
`md_claude_md`	316 tk	247 tk	-22%	1.2 ms
`claude_md_overhead`	717 tk	635 tk	-11%	25 µs
`git_diff`	502 tk	497 tk	-1%	44 µs
`jest_failures`	451 tk	448 tk	-1%	57 µs
`state_first_simulation`	182 tk	181 tk	-1%	5 µs
`kubectl_pods`	1,513 tk	1,513 tk	-0%	32 µs

Aggregate

Metric	Value
Total token reduction	90.7% — 164,224 tk → 15,206 tk
Bash output	-84.0%
Markdown / context files	-23.5%
Wrap / cross-call engine	-99.2%
Quality (signal terms preserved)	28 / 28 pass
Latency p50 (filter mode)	4.2 ms
Latency p95 (incl. wrap/summarize)	52 ms

Estimated cost savings — Claude Sonnet 4.6 · $3.00 / MTok input

Usage	Baseline / month	Saved / month
100 calls / day	$18.00	$16.33 (91%)
1,000 calls / day	$180.00	$163.33 (91%)
10,000 calls / day	$1800.00	$1633.32 (91%)

Commands

squeez wrap <cmd>                        # compress a command's output end-to-end
squeez filter <hint>                     # compress stdin (piped usage)
squeez compress-md [--ultra] [--dry-run] [--all] <file>...   # compress markdown files
squeez benchmark [--json] [--output <file>] [--scenario <name>] [--iterations <n>]
squeez mcp                               # JSON-RPC 2.0 MCP server over stdin/stdout
squeez protocol                          # print the auto-teach payload (markers + protocol)
squeez update [--check] [--insecure]     # self-update
squeez init [--copilot]                  # session-start hook (called by hook, not manually)
squeez calibrate                         # auto-tune config from benchmarks
squeez budget-params <tool>              # output JSON budget patch for tool
squeez --version

Escape hatch — bypass compression for one command

--no-squeez git log --all --graph

Prefix any command with --no-squeez to run it raw without squeez touching it.

`squeez wrap`

Runs a command, compresses its output, and prints a savings header:

# squeez [git log] 2692→289 tokens (-89%) 0.2ms [adaptive: Ultra]

`squeez filter`

Reads from stdin. Use for manual pipelines:

git log --oneline | squeez filter git
docker logs mycontainer 2>&1 | squeez filter docker

`squeez compress-md`

Pure-Rust, zero-LLM compressor for markdown files. Preserves code blocks, inline code, URLs, headings, file paths, and tables. Compresses prose only. Always writes a backup at <stem>.original.md.

squeez compress-md CLAUDE.md             # Full mode (English default)
squeez compress-md --ultra CLAUDE.md    # + abbreviations (with→w/, fn, cfg, etc.)
squeez compress-md --lang pt CLAUDE.md  # pt-BR locale (articles, fillers, phrases)
squeez compress-md --dry-run CLAUDE.md  # preview, no write
squeez compress-md --all                # compress all known locations automatically

When auto_compress_md = true (default), squeez init runs --all silently on every session start.

`squeez benchmark`

Reproducible measurement of token reduction, cost, latency, and quality across 19 scenarios:

squeez benchmark                          # human-readable report
squeez benchmark --json                   # JSON to stdout
squeez benchmark --output report.json     # save JSON report
squeez benchmark --scenario git           # run only git scenarios
squeez benchmark --iterations 5           # more iterations per scenario
squeez benchmark --list                   # list all scenarios

Quality is scored by checking that signal terms (words from error/warning/failed lines in the baseline) survive compression. 19/19 pass at ≥ 50% threshold.

`squeez mcp`

Runs a Model Context Protocol JSON-RPC 2.0 server over stdin/stdout. Hand-rolled, no mcp.server / fastmcp dependency — keeps the libc-only constraint intact. Wire it into Claude Code:

claude mcp add squeez -- /path/to/squeez mcp

Thirteen read-only tools become available to the LLM:

Tool	Returns
`squeez_recent_calls`	Last N bash invocations with hash + length + cmd snippet — check before re-running
`squeez_seen_files`	Files this session has touched, with access type (Read/Write/Created/Deleted), sorted by recency
`squeez_seen_errors`	Distinct error fingerprints observed this session (FNV-1a hashes of normalized errors)
`squeez_seen_error_details`	Error fingerprints with the first 128 chars of message text — find what the error was
`squeez_session_summary`	Token accounting + call counts (tokens_bash / tokens_read / tokens_other / seen_files / seen_errors / seen_git_refs)
`squeez_session_stats`	Dedup hit counts (exact + fuzzy), summarize triggers, Ultra-mode calls, tokens saved per category
`squeez_agent_costs`	Sub-agent usage: spawn count, cumulative estimated tokens, per-call breakdown
`squeez_session_efficiency`	Session efficiency scores: compression ratio, tool choice, context reuse, budget conservation (basis points)
`squeez_prior_summaries`	Last N finalized prior-session summaries with structured fields: investigated / learned / completed / next_steps
`squeez_search_history`	Full-text search across all session summaries — find when you last saw an error or touched a file
`squeez_file_history`	Sessions where a given file path was touched, with token-savings and commit status
`squeez_session_detail`	Full structured view of a past session by date: calls, files, errors, git events, test summary
`squeez_protocol`	Auto-teach payload — read once per session to learn squeez's markers + memory protocol

All read-only. Backed by SessionContext::load(), memory::read_last_n(), and memory::search_history(). No side effects.

`squeez protocol`

Prints the auto-teach payload — a 2.4 KB self-describing block covering:

The 5-rule memory protocol (what to do with [squeez: ...] markers, when to call the MCP tools)
The output marker spec (# squeez [...], [squeez: identical to ...], [squeez: ~95% similar to ...], squeez:summary, # squeez hint:)

Same content the MCP squeez_protocol tool returns. Pipe it into a system prompt or paste it into a one-shot session that doesn't have the MCP server connected.

Configuration

Optional config file — all fields have defaults, none are required.

Platform	Config path
Claude Code / default	`~/.claude/squeez/config.ini`
Copilot CLI	`~/.copilot/squeez/config.ini`

# ── Compression ────────────────────────────────────────────────
max_lines              = 200     # generic truncation limit
dedup_min              = 3       # collapse lines appearing ≥N times
git_log_max_commits    = 20
git_diff_max_lines     = 150
docker_logs_max_lines  = 100
find_max_results       = 50
bypass                 = docker exec, psql, mysql, ssh   # never compress these

# ── Context engine ─────────────────────────────────────────────
adaptive_intensity         = true    # truly adaptive: Full <80% budget, Ultra ≥80%
context_cache_enabled      = true    # track seen files/errors across calls
redundancy_cache_enabled   = true    # collapse identical OR fuzzy-similar recent outputs
summarize_threshold_lines  = 500     # outputs above this trigger summarize fallback (×2 if benign)
compact_threshold_tokens   = 120000  # session token budget — drives adaptive intensity

# ── Session memory ─────────────────────────────────────────────
memory_retention_days = 30

# ── Output / persona ───────────────────────────────────────────
persona          = ultra    # off | lite | full | ultra
auto_compress_md = true     # run compress-md on every session start
lang             = en       # compress-md locale: en | pt (pt-BR) — more languages extensible

# ── Advanced tuning (rarely needed) ───────────────────────────
max_call_log              = 32    # rolling call log depth (also caps redundancy window)
recent_window             = 16    # how many recent calls are eligible for redundancy lookup
similarity_threshold      = 0.85  # Jaccard threshold for fuzzy dedup (0.0–1.0)
ultra_trigger_pct         = 0.80  # fraction of context budget at which Full → Ultra
mcp_prior_summaries_default = 5   # default n for squeez_prior_summaries
mcp_recent_calls_default    = 10  # default n for squeez_recent_calls

# ── Token economy ─────────────────────────────────────────────
agent_warn_threshold_pct  = 0.50  # warn when agent cost > 50% of budget
burn_rate_warn_calls      = 20    # warn when < 20 calls remaining
agent_spawn_cost          = 200000 # estimated tokens per Agent/Task spawn
read_max_lines            = 0     # max lines injected into Read tool_input (0 = off)
grep_max_results          = 0     # max results injected into Grep tool_input (0 = off)

# ── Auto-curation nudges ──────────────────────────────────────
nudge_enabled              = true   # emit [squeez: hint ...] markers on recurring patterns
nudge_error_threshold      = 3      # fingerprint repeats before a nudge fires
nudge_file_mod_threshold   = 5      # writes/creates to same path before nudge fires
nudge_cmd_repeat_threshold = 4      # expensive-command repeats before nudge fires

# ── Continuous handler calibration ────────────────────────────
handler_stats_enabled      = true   # accumulate per-handler savings across sessions

Adaptive intensity — Full / Ultra split

When adaptive_intensity = true (default), squeez actually adapts to session pressure rather than always running Ultra:

Used / budget	Tier	Scaling
`< 80%`	Full	×0.6 limits, dedup_min ×0.66 (floor 2)
`≥ 80%`	Ultra	×0.3 limits, dedup_min ×0.5 (floor 2)
`adaptive_intensity = false`	Lite	passthrough — no scaling

Floors are enforced so we never reduce to zero: max_lines ≥ 20, git_diff_max_lines ≥ 20, dedup_min ≥ 2, summarize_threshold_lines ≥ 50.

The active level is shown in every bash header: [adaptive: Full] or [adaptive: Ultra].

Pre-0.3 squeez was effectively always-Ultra. The new behavior preserves more verbatim text in the common case (empty / mid-session) and only graduates to aggressive compression when the context budget is genuinely under pressure.

Caveman persona

Three intensity levels (lite, full, ultra) and off. Default is ultra. The persona prompt is injected into:

The Claude Code session banner (printed at SessionStart)
The … block in ~/.copilot/copilot-instructions.md for Copilot CLI

How it works

Compression pipeline

Each bash command passes through four strategies in order:

smart_filter — strips ANSI codes, progress bars, spinner chars, timestamps, and tool-specific noise (npm download lines, stack frame noise, etc.)
dedup — lines appearing ≥ dedup_min times are collapsed to one entry annotated [×N]
grouping — files in the same directory (≥5 siblings) are collapsed to dir/ N modified [squeez grouped]
truncation — Head (keep first N) or Tail (keep last N) depending on handler; truncated portion noted

Supported handlers

Category	Commands
Git	`git`
Docker / containers	`docker`, `docker-compose`, `podman`
Package managers	`npm`, `pnpm`, `bun`, `yarn`
Build systems	`make`, `cmake`, `gradle`, `mvn`, `xcodebuild`, `cargo` (build), `next build/dev/start`
Test runners	`cargo test`, `jest`, `vitest`, `pytest`, `nextest`, `playwright`, `bun test`
TypeScript / linters	`tsc`, `eslint`, `biome`
Cloud CLIs	`kubectl`, `gh`, `aws`, `gcloud`, `az`, `wrangler`
Databases	`psql`, `prisma`, `mysql`, `drizzle-kit`
Filesystem	`find`, `ls`, `du`, `ps`, `env`, `lsof`, `netstat`
JSON / YAML / IaC	`jq`, `yq`, `terraform`, `tofu`, `helm`, `pulumi`
Text processing	`grep`, `rg`, `awk`, `sed`
Network	`curl`, `wget`
Runtimes	`node`, `python`, `ruby`
Generic fallback	everything else

Hooks (Claude Code & Copilot CLI)

Six hooks work together automatically after install on Claude Code (three on Copilot CLI):

PreToolUse — rewrites every Bash call: git status → squeez wrap git status; injects Read/Grep/Glob limits; compresses Agent/Task prompts
SessionStart — runs squeez init: finalizes previous session into a memory summary, injects the persona prompt
PostToolUse — tracks every tool result; rewrites Read/Grep/Glob/Monitor output via updatedToolOutput when content is redundant or oversized (Claude Code v2.1.119+)
SubagentStop (Claude Code only) — feeds last_assistant_message into SessionContext so the parent agent can dedup against what the sub-agent saw
PreCompact (Claude Code only) — logs compaction events for session efficiency metrics; allows compaction to proceed
PostCompact (Claude Code only) — emits a re-arm reminder after compaction so the model knows compression is still active

Cross-call redundancy

Two-path dedup across the last 16 calls:

Exact match — FNV-1a hash of the compressed output. When a subsequent call produces the same bytes, it collapses to:

[squeez: identical to 515ba5b2 at bash#35 — re-run with --no-squeez]

Fuzzy match — bottom-k MinHash over whitespace-token trigrams (k=96, Jaccard ≥ 0.85, length-ratio guard ≥ 0.80). Survives timestamp changes, added/removed blank lines, and single-line edits. Collapses to:

[squeez: ~92% similar to 515ba5b2 at bash#35 — re-run with --no-squeez]

Minimum 6 lines to attempt fuzzy match (below that, exact-only).

Summarize fallback

When raw output exceeds summarize_threshold_lines (default 500), the full pipeline is bypassed and replaced with a ≤40-line dense summary:

squeez:summary cmd=docker logs app
total_lines=5003
top_errors:
  - error: connection refused on tcp://10.0.0.1:5432
top_files:
  - /var/log/app/error.log
test_summary=FAILED: 3 of 248
tail_preserved=20
[last 20 lines verbatim...]

Benign-aware threshold: before summarizing, squeez scans for error markers (error:, panic, traceback, FAILED, EXCEPTION, Fatal). If none are found, the threshold is doubled (1,000 lines default) so successful builds, clean test runs, and uneventful logs stay verbatim unless they are genuinely huge.

Platform notes

OpenCode

Plugin installed at ~/.config/opencode/plugins/squeez.js. OpenCode auto-loads plugins on startup. All Bash commands are automatically compressed via squeez wrap.

GitHub Copilot CLI

Hooks registered in ~/.copilot/settings.json. Session memory written to ~/.copilot/copilot-instructions.md (Copilot CLI reads this automatically). State stored separately at ~/.copilot/squeez/.

Refresh memory manually:

SQUEEZ_DIR=~/.copilot/squeez ~/.claude/squeez/bin/squeez init --copilot

Pi

TypeScript extension installed at ~/.pi/agent/extensions/squeez/index.ts. Pi auto-discovers extensions from that directory — no settings patching needed. Session memory is injected via a skill at ~/.pi/agent/skills/squeez/SKILL.md; Pi includes the skill description in every system prompt and loads full instructions on demand. Unlike other hosts, Pi achieves BUDGET_HARD output compression via the tool_result event (return-patch API), not just soft hints.

Local development

Requires Rust stable. Windows requires Git Bash.

git clone https://github.com/claudioemmanuel/squeez.git
cd squeez

cargo test                  # run all tests (356 tests, 37 suites)
cargo build --release       # build release binary

bash bench/run.sh           # filter-mode benchmark (14 fixtures)
bash bench/run_context.sh   # context-engine benchmark (3 wrap scenarios)
./target/release/squeez benchmark   # full 19-scenario benchmark suite

bash build.sh               # build + install to ~/.claude/squeez/bin/

Contributing

git checkout -b feature/your-change
cargo test
cargo build --release
bash bench/run.sh
git push -u origin feature/your-change
gh pr create --base main --title "Short title" --body "Description"

CI runs cargo test, bench/run.sh, bench/run_context.sh, and squeez benchmark on every push and pull request.

See CONTRIBUTING.md for coding standards.

Similar projects

There is an unrelated project with the same name at KRLabsOrg/squeez. This project (claudioemmanuel/squeez) is a hook-based token compressor for AI coding CLIs.

License

Licensed under the Apache License 2.0 — see LICENSE + NOTICE.

Contributions require a DCO sign-off (git commit -s …) rather than a CLA. You keep copyright on what you contribute; sign-off is a lightweight affirmation that you have the right to submit it under Apache 2.0. See CONTRIBUTING.md for details.

claudioemmanuel/squeez

Details