[2026.05 Week 5] Five Trending Repos of the Week

Five GitHub repositories trending this week.

May 31, 2026

TL;DR
AI agent tooling dominated. Most of the trending repos were coding agents, agent runtimes, or add-ons for them.
Skill packs were everywhere. Curated bundles of agent skills and plugins kept trending, and OpenAI, Anthropic, Cursor, and Google all shipped their own official collections.
Some tools pushed against AI slop. Governance frameworks and “stop the generic output” skills trended too, built to rein in what the agents produce.
Self-hosting stayed strong. Run-it-yourself, own-your-data tools kept trending with no managed cloud behind them.

Here are this week’s five picks.

Handy (⭐ 22.8k). A desktop app that turns speech into text fully offline, with no account and no cloud.
Twenty (⭐ 48.7k). An open-source CRM you self-host, where a custom field you add in the UI becomes a real Postgres column.
Syncthing (⭐ 84.8k). Continuous file synchronization that keeps folders identical across all your devices.
Frigate (⭐ 33.4k). A self-hosted NVR that runs realtime object detection on your IP cameras, entirely on local hardware.
MarkItDown (⭐ 132.8k). A Microsoft tool that turns PDFs, Office files, and images into clean Markdown for language models.

cjpais/Handy

⭐ 22.8k · Rust

Handy is a free, open-source speech-to-text app that runs fully offline. Most of it is Rust, handling audio capture and model inference, with a small Tauri webview for the UI. You hold a hotkey, speak, release, and the transcribed text lands in whatever app you were typing in. Handy ships eight different speech recognition backends you can switch between, from OpenAI’s Whisper to Nvidia’s Parakeet, and the lighter ones transcribe in near real time on a CPU that is years old.

Most of what you record is dead air, the pause before your first word and the silence after your last. Handy trims it before the audio reaches the model. The naive way to do that mangles speech. By the time a detector is sure it hears “voice” the first syllable is already gone, and the first pause between words ends the recording early. Handy runs the raw detector through a small state machine that fixes both ends.

smoothed.rs:41-96

fn push_frame(&mut self, frame: &[f32]) -> Result<VadFrame> {
    self.frame_buffer.push_back(frame.to_vec()); // always buffer recent frames
    while self.frame_buffer.len() > self.prefill_frames + 1 {
        self.frame_buffer.pop_front();
    }
    let is_voice = self.inner_vad.is_voice(frame)?; // the raw yes/no detector

    match (self.in_speech, is_voice) {
        (false, true) => { // speech might be starting
            self.onset_counter += 1;
            if self.onset_counter >= self.onset_frames {
                self.in_speech = true;
                self.hangover_counter = self.hangover_frames;
                // flush the buffered pre-roll so the opening syllable survives
                self.temp_out.clear();
                for buf in &self.frame_buffer { self.temp_out.extend(buf); }
                Ok(VadFrame::Speech(&self.temp_out))
            } else {
                Ok(VadFrame::Noise) // not convinced yet
            }
        }
        (true, true) => {  // mid-speech, refill hangover
            self.hangover_counter = self.hangover_frames;
            Ok(VadFrame::Speech(frame))
        }
        (true, false) => { // voice stopped, coast on hangover
            if self.hangover_counter > 0 {
                self.hangover_counter -= 1;
                Ok(VadFrame::Speech(frame))
            } else {
                self.in_speech = false;
                Ok(VadFrame::Noise)
            }
        }
        (false, false) => { // silence, reset the streak
            self.onset_counter = 0;
            Ok(VadFrame::Noise)
        }
    }
}

The wrapped inner_vad only answers yes or no for one 30ms frame at a time, and SmoothedVad turns that stream of verdicts into clean speech boundaries with three counters. onset_frames of 2 means a lone noisy frame, a keyboard click or a cough, never trips recording into speech, since it takes two voiced frames in a row. prefill_frames fixes the clipped first word. Every frame goes into a ring buffer whether it counts as voice or not, and the instant onset fires the whole buffer flushes ahead of the current frame, so the recording reaches back roughly 450ms before the detector was sure. hangover_frames covers the tail, keeping speech alive for 15 more frames after voice drops and resetting on every voiced frame, so a short pause mid-sentence never splits one utterance in two.

Downstream, the recorder just trusts the verdict, appending each VadFrame::Speech buffer to the recording and throwing VadFrame::Noise away, so silence never reaches Whisper or Parakeet. What lands in the model is your voice with the dead air trimmed, the first word intact and the last one kept. The same recording from a naive threshold would drop your opening syllable and chop the end of every sentence.

twentyhq/twenty

⭐ 48.7k · TypeScript

Twenty is an open-source CRM you run yourself, pitched as an alternative to Salesforce. You deploy it with Docker against a Postgres database and get contacts, companies, deals, and pipelines in a UI that feels part spreadsheet, part Notion.

Adding a custom field is where Twenty diverges from other CRMs. Most CRMs store a custom field as a row in a key-value table or a key inside a JSON blob. Twenty makes it a real Postgres column.

Every workspace gets its own Postgres schema, named workspace_ followed by the workspace ID in base 36. Your custom objects and fields live as actual tables and columns inside that schema, so creating a field is a schema migration rather than a row insert.

workspace-schema-column-manager.service.ts:8-27

async addColumns({ queryRunner, schemaName, tableName, columnDefinitions }) {
    if (columnDefinitions.length === 0) return;

    const addColumnClauses = columnDefinitions.map(
      (column) => `ADD COLUMN ${buildSqlColumnDefinition(column)}`,
    );
    // ALTER TABLE workspace_<id>.<table> ADD COLUMN ...
    const sql = `ALTER TABLE ${escapeIdentifier(schemaName)}.${escapeIdentifier(tableName)} ${addColumnClauses.join(', ')}`;

    await queryRunner.query(sql);
}

Creating a field in the UI calls addColumns, which emits one ALTER TABLE against the workspace schema. buildSqlColumnDefinition maps the field type to a Postgres type and builds the clause, escaping every identifier and mapping types through a fixed enum, so a field name can never smuggle SQL into the statement.

The result is a real column with the right type, defaults, and constraints, in a schema owned by one workspace. A “Favorite color” field you invent is queried as its own column, not dug out of a shared blob and cast at read time. That is more work than an EAV table, and it keeps your CRM a real relational database you can index, join, and back up.

syncthing/syncthing

⭐ 84.8k · Go

Syncthing is open-source continuous file synchronization. It keeps a set of folders identical across two or more of your machines. You install it on each device, exchange device IDs once, and an edit on your laptop shows up on your desktop a moment later. No central server stores your files or brokers the connection.

For two devices to sync they first have to find each other’s IP and port, which can change on any network. On a local network, Syncthing solves this by having every device shout into the void on a timer and listen for everyone else doing the same.

local.go:148-225

func (c *localClient) sendLocalAnnouncements(ctx context.Context) error {
    var msg []byte
    var ok bool
    instanceID := rand.Int63()
    for {
        if msg, ok = c.announcementPkt(instanceID, msg[:0]); ok {
            c.beacon.Send(msg) // UDP broadcast / multicast
        }
        select {
        case <-c.localBcastTick:  // periodic, every 30s
        case <-c.forcedBcastTick: // forced when a new device appears
        case <-ctx.Done():
            return ctx.Err()
        }
    }
}

func (c *localClient) recvAnnouncements(ctx context.Context) error {
    for {
        buf, addr := c.beacon.Recv()
        // ... check magic bytes, unmarshal the protobuf announcement into pkt ...
        if !bytes.Equal(pkt.Id, c.myID[:]) {
            if newDevice := c.registerDevice(addr, &pkt); newDevice {
                c.forcedBcastTick <- time.Now() // answer right away
            }
        }
    }
}

Every device runs both loops over a beacon, a UDP broadcast or multicast socket. sendLocalAnnouncements blasts a small packet with the device’s ID and addresses every 30 seconds. recvAnnouncements listens, ignores its own packets, and registers any new device along with the address it came from. On meeting someone new it forces an immediate re-announce, so a fresh peer learns the network within a second instead of waiting out the next tick.

That handles the LAN with no infrastructure. Devices on separate networks fall back to a global discovery server keyed by device ID and to relays for the data, but the local path needs none of it. The device ID is a hash of the device’s TLS certificate, so a forged announcement cannot point you at the wrong machine and still pass the handshake. Your phone and laptop find each other through two short loops over a broadcast socket.

blakeblackshear/frigate

⭐ 33.4k · Python

Frigate is a self-hosted network video recorder that runs realtime object detection on your IP cameras, all on your own hardware with nothing sent to the cloud. You list your cameras’ RTSP streams in a config file and run it in Docker. It records clips and fires events when it sees a person or a car, usually wired into Home Assistant. It runs on such cheap hardware that the docs now tell new users to skip the Coral AI accelerator and use a modest Intel mini PC instead. It gets away with that because it barely runs the neural detector at all.

Running an object detector on every frame of several 1080p streams would cook a small machine. Frigate sidesteps that with basic motion detection that decides when the expensive model is even worth waking.

improved_motion.py:124-154

# compare the current frame to a running average of recent frames
frameDelta = cv2.absdiff(resized_frame, cv2.convertScaleAbs(self.avg_frame))

# threshold the difference, then dilate to fill in holes
thresh = cv2.threshold(frameDelta, self.config.threshold, 255, cv2.THRESH_BINARY)[1]
thresh_dilated = cv2.dilate(thresh, None, iterations=1)
contours = grab_cv2_contours(
    cv2.findContours(thresh_dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
)

motion_boxes = []
for c in contours:
    # only count a contour large enough to matter
    if cv2.contourArea(c) > self.config.contour_area:
        x, y, w, h = cv2.boundingRect(c)
        motion_boxes.append(
            (int(x * self.resize_factor), int(y * self.resize_factor),
             int((x + w) * self.resize_factor), int((y + h) * self.resize_factor))
        )

Frigate shrinks each frame to a small grayscale image, subtracts it from a running average of recent frames, thresholds the difference into a black-and-white mask, and dilates to close gaps. Any contour in that mask bigger than a configured area becomes a motion box. It is just OpenCV on a tiny image, no model and no GPU, so the cost per frame is close to nothing.

Those boxes gate everything downstream. They are clustered with the boxes of already-tracked objects into a handful of regions, and the detector runs only on those, never the whole frame. A frame with nothing moving sends nothing to the model, and objects that have gone still are remembered and skipped until motion overlaps them again, so a parked car is not re-detected hundreds of times. The “AI camera” spends most of its compute on plain background subtraction and saves the neural network for the few patches of pixels that actually changed. That inversion is why it runs on hardware cheaper than one of the cameras feeding it.

microsoft/markitdown

⭐ 132.8k · Python

MarkItDown is a Python tool from Microsoft that turns PDFs, Office files, images, audio, and HTML into Markdown. Its README is upfront that the output is built for language models to read rather than people, since LLMs handle Markdown well and it keeps the token count down.

Most converters pick how to parse a file from its extension, but extensions lie. A .txt can actually be a PowerPoint, and an exported email often has no extension at all. So MarkItDown ignores the name and looks at the file’s actual bytes first.

_markitdown.py:721-788

# Call magika to guess the type from the stream content
result = self._magika.identify_stream(file_stream)
if result.status == "ok" and result.prediction.output.label != "unknown":
    # ... guess charset and extension from the content ...

    # Is the content-based guess compatible with what the filename claimed?
    compatible = True
    if base_guess.extension is not None and (
        base_guess.extension.lstrip(".") not in result.prediction.output.extensions
    ):
        compatible = False

    if compatible:
        guesses.append(StreamInfo(mimetype=..., extension=..., charset=...))
    else:
        # the name and the bytes disagree, so keep BOTH as candidates
        guesses.append(enhanced_guess)  # from the filename
        guesses.append(StreamInfo(
            mimetype=result.prediction.output.mime_type,
            extension=guessed_extension, charset=charset,
        ))  # from the content

identify_stream runs magika, Google’s small neural file-type classifier, over the real bytes. When the detected type disagrees with what the extension claimed, MarkItDown does not pick a winner. It appends both the filename-based guess and the content-based guess to a list of candidates.

From there, _convert tries each candidate against every converter in priority order, rewinding the stream between attempts, and the first that accepts and succeeds wins. Specific converters register at priority 0 and generic ones like plain text at 10, so a PowerPoint mislabeled .txt reaches the PPTX converter before the text fallback. MarkItDown treats the extension as a hint and the bytes as the truth, then brute-forces the right converter. That unglamorous machinery is what “convert anything to Markdown” actually takes.

Code Pointer

Discussion about this post

Ready for more?