[2026.05 Week 4] Five Trending Repos of the Week

Five GitHub repositories trending this week

May 24, 2026

TL;DR
Skill and plugin collections led the week. More than a dozen repos were just curated directories of agent skills or plugins, and big vendors are shipping them now, not only hobbyists.
A handful of repos index your codebase for agents. Knowledge graphs and semantic indexes over your code, all selling the same pitch of fewer tokens and fewer tool calls.
Five separate repos exist just to hand agents free model access. They aggregate or rotate free-tier keys across a dozen or more providers, betting nobody enforces the limits.
Heavy compute kept moving on-device. Vector search, speech synthesis, and code search, all running on CPU with no managed service.

Here are this week’s five picks.

turbovec (⭐ 2.7k). A Rust vector index with Python bindings that fits a 10M-vector corpus into a fraction of the RAM.
google/ax (⭐ 0.9k). Google’s distributed runtime for coordinating long-running agent executions.
supertonic (⭐ 10.2k). An on-device text-to-speech system that runs entirely through ONNX across eleven language runtimes.
semble (⭐ 4.1k). A CPU-only code search library that hands agents snippets instead of whole files.
shannon (⭐ 43.6k). An autonomous white-box pentester that reads your source code and attacks the running app.

RyanCodrai/turbovec

⭐ 2.7k · Rust

turbovec is a Rust vector index with Python bindings, built on Google Research’s TurboQuant algorithm. You pip install turbovec, create a TurboQuantIndex, call add then search, and you have a local quantized index with no managed service involved. A 10 million document corpus that needs 31 GB as float32 fits in 4 GB, and search runs 12 to 20 percent faster than FAISS on ARM.

Quantization swaps each 32-bit number in a vector for a few bits that point to a representative value, which is what shrinks 31 GB down to 4 GB while keeping vectors close enough to still rank correctly. Most quantizers learn their codebook (the small table of representative values that compressed codes point back to) from your data, running a training pass before they can compress anything. turbovec never looks at your vectors. A single random orthogonal rotation does the work.

encode.rs:29-75

// 1. Extract norms and normalize
for i in 0..n {
    let row = &vectors[i * dim..(i + 1) * dim];
    let norm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
    norms[i] = norm;
    // ... divide row by norm into unit_flat ...
}

// 2. Rotate: rotated = unit @ rotation.T (BLAS-accelerated via ndarray)
let rotated_mat = unit_mat.dot(&rot_mat.t());

// 3. Quantize: for each boundary, codes += (rotated > boundary)
for b in boundaries {
    for idx in 0..n * dim {
        if rotated[idx] > *b { codes[idx] += 1; }
    }
}

// 4. Per-vector correction scale = ||v|| / <u, x_hat>
for i in 0..n {
    let mut inner = 0.0f64;
    for j in 0..dim {
        inner += rotated[row_start + j] as f64 * centroids[codes[row_start + j] as usize] as f64;
    }
    scales[i] = norms[i] / inner.max(1e-10) as f32;
}

The rotation is why no training is needed. After an orthogonal rotation, each coordinate of a unit vector follows a known Beta distribution, a fact rotation.rs notes in its opening comment. Since that distribution is fixed and data independent, the optimal quantization boundaries can be solved once from a formula. codebook.rs runs Lloyd-Max on the Beta distribution to produce boundaries and centroids before any vector is seen, so step 3 quantizes by counting how many boundaries each rotated coordinate clears.

Step 4 keeps recall high. Quantizing a vector shrinks its reconstructed length below the original, biasing the dot product downward. turbovec stores a per-vector scale of ||v|| / <u, x_hat>, a length-renormalization borrowed from RaBitQ, and applies it at the final multiply so the quantized score stays an unbiased estimate of the real one. Skipping training means adds are instant and the index never needs a rebuild as the corpus grows, the property a streaming RAG store wants.

google/ax

⭐ 0.9k · Go

AX, short for Agent eXecutor, is Google’s distributed agent runtime, written in Go and still in early active development. It coordinates agentic loops through a central Controller while tools, skills, and sub-agents run as isolated actors connected over gRPC. Its design rests on a single-writer architecture where one Controller is the only writer to a durable event log, which gives AX automatic recovery and resumption after a crash or a pause.

Because every step an agent takes is an ordered entry in that log, you can fork an execution at any point in its history. The CLI exposes this as ax fork --src-conversation X --dest-conversation Y --src-seq 12.

controller.go:281-326

events, err := d.eventLog.Events(ctx, srcConversationID)
// ...
if srcSeq > 0 {
    found := false
    for i, ev := range events {
        if ev.Seq == srcSeq {
            events = events[:i+1]
            found = true
            break
        }
        if ev.Seq > srcSeq {
            break
        }
    }
    if !found {
        return "", fmt.Errorf("src_seq %d not found in conversation %s", srcSeq, srcConversationID)
    }
}

for _, ev := range events {
    // Clone the event to update the conversation ID.
    newEvent := &proto.ConversationEvent{
        ConversationId: destConversationID,
        Seq:            ev.Seq,
        ExecId:         ev.ExecId,
        Messages:       ev.Messages,
        State:          ev.State,
    }
    if _, err := d.eventLog.Append(ctx, newEvent); err != nil {
        return "", fmt.Errorf("failed to append forked event: %w", err)
    }
}

The whole operation is a slice and a replay. Events pulls the source conversation’s events in sequence order from the SQLite-backed log. Given a srcSeq, the loop truncates the slice to everything up to that sequence number, rejecting one that does not exist, then clones each event under the new conversation id and appends it, leaving the original untouched.

AX gets forking almost for free because the event log is the source of truth and sequence numbers are stable addresses into it. You keep the context up to step 12 and explore a different continuation, exactly how you would A/B two prompts or recover from a bad tool call without starting over.

supertone-inc/supertonic

⭐ 10.2k · Swift

Supertonic is an on-device text-to-speech system that runs entirely through ONNX Runtime (a cross-platform engine for running trained machine learning models), with no cloud and no API calls. Download the ONNX assets, pick one of eleven runtime implementations (Python, Node, the browser, Swift, Rust, and more), and feed text to get audio back on the CPU. The 66M parameter model generates speech up to 167 times faster than real-time on an M4 Pro, hitting 1263 characters per second against ElevenLabs Flash at 287, and it runs on a Raspberry Pi and an e-reader in airplane mode.

The README lists “natural text handling” as a feature, claiming it reads strings like “$450K” or “Wed. June 23rd” and speaks them correctly with no pre-processing to spell them out first. Most TTS engines need a large rules engine to expand “$450K” into “four hundred fifty thousand dollars” before the model ever sees it. Supertonic carries almost no such rules.

helper.py:116-132

def _text_to_unicode_values(self, text: str) -> np.ndarray:
    unicode_values = np.array(
        [ord(char) for char in text], dtype=np.uint16
    )  # 2 bytes
    return unicode_values

def __call__(self, text_list: list[str]) -> tuple[np.ndarray, np.ndarray]:
    text_list = [self._preprocess_text(t) for t in text_list]
    text_ids_lengths = np.array([len(text) for text in text_list], dtype=np.int64)
    text_ids = np.zeros((len(text_list), text_ids_lengths.max()), dtype=np.int64)
    for i, text in enumerate(text_list):
        unicode_vals = self._text_to_unicode_values(text)
        text_ids[i, : len(unicode_vals)] = np.array(
            [self.indexer[val] for val in unicode_vals], dtype=np.int64
        )
    text_mask = self._get_text_mask(text_ids_lengths)
    return text_ids, text_mask

The _preprocess_text step that runs first is cosmetic. It strips emoji, swaps fancy quotes for ASCII, and expands tokens like “e.g.,” and “@”, with comments carrying a TODO for a better normalizer and a FIXME admitting it breaks on non-English text. Then _text_to_unicode_values turns the string into raw Unicode codepoints with ord(char), which map straight through an indexer to token ids. The pipeline has no number expander, no date parser, and no grapheme-to-phoneme stage.

So the model reads characters directly and learned to pronounce “Wed. June 23rd” from training data, not from a rule someone wrote. The upside is a tiny, language-agnostic pipeline, part of why the same ONNX assets drop into eleven runtimes unchanged. The downside is that same property inverted. A mispronunciation has no rule to patch, so fixing it means retraining the weights. The messy-text handling lives in the model, not the code.

MinishLab/semble

⭐ 4.1k · Python

Semble is a code search library built for agents. You run it as an MCP server or call semble search from the shell, an agent asks in natural language like “how is authentication handled”, and it gets back the exact snippets instead of grepping and reading whole files.

It claims roughly 98 percent fewer tokens than grep plus read, with indexing about 200 times faster and queries about 10 times faster than a code-specialized transformer, at 99 percent of that transformer’s retrieval quality, all on CPU with no GPU or API keys.

Two ideas make that possible, and they pull against each other. The embedding model, potion-code-16M, runs no transformer forward pass. A normal embedder pushes every token through the network’s layers; this one just reads a precomputed vector per token from a table and averages them, so a query takes about a millisecond on CPU. But those vectors never change with context, which is why static embeddings are weak on their own and the search pipeline has to buy back the quality.

search.py:11-117

_RRF_K = 60

def _rrf_scores(scores: dict[Chunk, float]) -> dict[Chunk, float]:
    """Convert raw scores to RRF scores 1/(k + rank); higher raw score → rank 1."""
    if not scores:
        return scores
    ranked = sorted(scores, key=lambda c: -scores[c])
    return {chunk: 1.0 / (_RRF_K + rank) for rank, chunk in enumerate(ranked, 1)}

# ... inside search(), run static-embedding and BM25 search, over-fetching top_k * 5 ...

normalized_semantic = _rrf_scores(semantic_scores)
normalized_bm25 = _rrf_scores(bm25_scores)

combined_scores: dict[Chunk, float] = {
    chunk: alpha_weight * normalized_semantic.get(chunk, 0.0)
    + (1.0 - alpha_weight) * normalized_bm25.get(chunk, 0.0)
    for chunk in all_candidates
}

The pipeline runs two searches and merges them. The first is static-embedding search, which matches on meaning, so “how is auth handled” finds the right code even when it never says “auth.” The second is BM25, the classic keyword search that scores chunks on literal term matches and weights rarer words more heavily, which catches exact symbol names the embeddings miss. Each search pulls back five times as many candidates as the query asked for, so a good match that one search ranks low still survives into the merge.

The two can’t just be added, since their scores live on different scales. Reciprocal rank fusion fixes that by ignoring the raw scores and keeping only each result’s position in its list, turning rank into 1/(60 + rank). A first-place hit is worth the same from either search, so one side’s runaway score cannot swamp the other. An alpha_weight that shifts by query type then blends the two ranks, leaning on keywords for symbol lookups and on meaning for descriptions. A final rerank nudges the order with code-aware signals, boosting definitions and demoting test and legacy files.

KeygraphHQ/shannon

⭐ 43.6k · TypeScript

Shannon is an autonomous, white-box AI pentester for web applications and APIs, built on the Claude Agent SDK. You point it at a running app and its source with ./shannon start -u <url> -r my-repo, and a pipeline of thirteen agents reads the code, maps the attack surface, and hunts five vulnerability classes in parallel. Every issue it reports ships with working exploit code, so the question worth asking is how it decides a line is actually exploitable instead of merely suspicious.

A normal scanner sees user input wrapped in a sanitizer before a SQL query and marks the line safe. Shannon’s injection agent instead traces each tainted input to where it lands, labels what kind of slot it fills, and checks whether that specific sanitizer is the right defense for that slot.

vuln-injection.txt:143-148

- **4) Match sanitization to sink context**
    - **SQL:** Binds for val/like/num; whitelist for enum/ident. Mismatch: concat, regex, wrong slot defense
    - **Command:** Array args (`shell=False`) OR `shlex.quote()`. Mismatch: concat, blacklist, `shell=True`
    - **File/Path:** Whitelist paths OR `resolve()` + boundary check. Mismatch: concat, `../` blacklist, no protocol check
    - **SSTI:** Sandboxed context + autoescape; no user input in expressions. Mismatch: concat, weak sandbox
    - **Deserialize:** Trusted sources only; safe formats + HMAC. Mismatch: untrusted input, pickle/unserialize

The same SQL query has more than one kind of slot. A user value, like a WHERE filter, is safe once it is bound as a query parameter, the standard placeholder fix. A sort column in ORDER BY cannot be bound that way, so the same fix protects nothing there. Shannon flags a query that binds its values but still drops tainted input into ORDER BY, the mismatch a signature scanner waves through. It also throws out a sanitizer when the code concatenates the string again after it, since that puts the tainted input right back.

Two design choices keep this honest. The vuln agent never reads source files itself; it hands every trace to a sub-agent and spends its own context on the verdict. And each finding stays a typed JSON object until a separate exploit agent works through the list and breaks the running app for real. That drives false positives toward zero, but it cuts both ways. Shannon stays silent on anything it cannot actively exploit, such as a vulnerable dependency, and because it attacks the live target as it works, point it only at systems you own.

Code Pointer

Discussion about this post

Ready for more?