[2026.05 Week 3] Five Trending Repos of the Week

Five GitHub repositories trending this week.

May 18, 2026

TL;DR
“Agent-native” was the marketing phrase of the week. Half the top of trending was some flavor of CLI, runtime, or wrapper framed as a thing for AI agents to drive instead of humans.
A new open model launch reshaped the top of trending. New TUIs, runtimes, and wrappers all trended off the release within days.
A fresh wave of LLM gateways and proxies landed on trending. Each promises some flavor of “route your coding agent through a unified endpoint” and claims to be the last router you will need.
Companion apps for coding agents showed up in numbers. Desktop dashboards, messaging bridges, and team management UIs trended together, all billed as “your AI coding agent, but with a real interface.”

Here are this week’s five picks.

ds4 (⭐ 10.4k). Antirez’s C inference engine for DeepSeek V4 Flash, targeting Apple Metal and CUDA.
CloakBrowser (⭐ 14.0k). A drop-in Playwright replacement that swaps Chromium for a build with 49 C++ fingerprint patches compiled in.
floci (⭐ 12.0k). A Quarkus-native AWS local emulator marketed as a free LocalStack alternative, with a 24ms cold start.
coral (⭐ 2.3k). A Rust SQL runtime that lets agents query APIs, files, and live sources through a single DataFusion-backed interface.
witr (⭐ 17.0k). A Go CLI that answers “why is this process running?” by walking the ancestry chain and identifying which supervisor spawned it.

antirez/ds4

⭐ 10.4k · C

ds4 (DwarfStar 4) is a native inference engine built for one model, DeepSeek V4 Flash. It is not a generic GGUF runner or a wrapper around llama.cpp, but a self-contained codebase covering loading, prompt rendering, tool calls, an on-disk KV cache, and an HTTP server. The bet is that one model with one tightly tuned engine beats a generic runtime that has to handle every architecture.

The engine is three files. ds4.c is the CPU path at 18k lines, ds4_cuda.cu packs ~100 hand-written kernels into 11k lines, and ds4_metal.m is the Metal equivalent at 15k lines. Antirez framed it on X as iteration speed mattering more than reusing what llama.cpp already has. Here is the YaRN-extended RoPE kernel that rotates the tail of every Q and K vector after attention projection.

ds4_cuda.cu:2340-2393

__global__ static void rope_tail_kernel(
        float *x,
        uint32_t n_tok, uint32_t n_head, uint32_t head_dim,
        uint32_t n_rot, uint32_t pos0, uint32_t pos_stride,
        uint32_t n_ctx_orig, int inverse,
        float freq_base, float freq_scale,
        float ext_factor, float attn_factor,
        float beta_fast, float beta_slow) {
    uint32_t gid = blockIdx.x * blockDim.x + threadIdx.x;
    uint32_t pairs = n_tok * n_head * (n_rot / 2);
    if (gid >= pairs) return;
    uint32_t pair = gid % (n_rot / 2);
    uint32_t tmp = gid / (n_rot / 2);
    uint32_t h = tmp % n_head;
    uint32_t t = tmp / n_head;
    uint32_t n_nope = head_dim - n_rot;
    uint32_t i = pair * 2;

    float corr0 = 0.0f, corr1 = 0.0f;
    if (ext_factor != 0.0f) {
        float denom = 2.0f * logf(freq_base);
        corr0 = floorf((float)n_rot * logf((float)n_ctx_orig / (beta_fast * 2.0f * (float)M_PI)) / denom);
        corr1 = ceilf((float)n_rot * logf((float)n_ctx_orig / (beta_slow * 2.0f * (float)M_PI)) / denom);
        corr0 = fmaxf(0.0f, corr0);
        corr1 = fminf((float)(n_rot - 1), corr1);
    }

    float theta_extrap = (float)(pos0 + t * pos_stride) * powf(freq_base, -((float)i) / (float)n_rot);
    float theta_interp = freq_scale * theta_extrap;
    float theta = theta_interp;
    float mscale = attn_factor;
    if (ext_factor != 0.0f) {
        float ramp_mix = rope_yarn_ramp_dev(corr0, corr1, (int)i) * ext_factor;
        theta = theta_interp * (1.0f - ramp_mix) + theta_extrap * ramp_mix;
        mscale *= 1.0f + 0.1f * logf(1.0f / freq_scale);
    }
    float c = cosf(theta) * mscale;
    float s = sinf(theta) * mscale;
    if (inverse) s = -s;

    float *tail = x + ((uint64_t)t * n_head + h) * head_dim + n_nope;
    float x0 = tail[i];
    float x1 = tail[i + 1];
    tail[i] = x0 * c - x1 * s;
    tail[i + 1] = x0 * s + x1 * c;
}

The kernel does one job. It rotates pairs of values inside each token’s Q and K vectors by an angle tied to where the token sits in the sequence. That rotation is RoPE, how the model encodes position, and YaRN is the variant that lets it stretch past the training context. Most engines wrap this in launch helpers and tensor structs. ds4 takes fourteen scalars and unpacks the thread id by hand in the first nine lines.

The last four lines are why the kernel exists. A pointer is computed straight into the activation buffer, two floats are read, the 2D rotation is applied, two floats are written back. No tensor wrapper sits between the kernel and the bytes. The same shape repeats across the other ~100 kernels in ds4_cuda.cu. ds4 trades the safety net of an abstraction for kernels that fit on one screen and read like the math they implement.

CloakHQ/CloakBrowser

⭐ 14.0k · Python

CloakBrowser is a drop-in Playwright and Puppeteer replacement that swaps in a custom Chromium binary built to pass bot detection. You pip install cloakbrowser, change one import, and your existing scraper runs against a browser that scores 0.9 on reCAPTCHA v3 and clears Cloudflare Turnstile, FingerprintJS, and BrowserScan. The fingerprint patches live in C++ inside the binary, and the Python wrapper layers human-like input on top.

The 49 C++ patches cover canvas, WebGL, audio, and font fingerprinting. The Python repo handles what comes next, once the static fingerprint passes and the detector starts watching cursor movement.

cloakbrowser/human/mouse.py:58-99

def human_move(
    raw: RawMouse,
    start_x: float, start_y: float,
    end_x: float, end_y: float,
    cfg: HumanConfig,
) -> None:
    dist = math.hypot(end_x - start_x, end_y - start_y)
    if dist < 1:
        return

    steps = max(cfg.mouse_min_steps, min(cfg.mouse_max_steps, round(dist / cfg.mouse_steps_divisor)))
    start = Point(start_x, start_y)
    end = Point(end_x, end_y)
    cp1, cp2 = _random_control_points(start, end)

    burst_counter = 0
    burst_size = rand_int_range(cfg.mouse_burst_size)

    for i in range(steps + 1):
        progress = i / steps
        eased_t = _ease_in_out(progress)
        pt = _bezier(start, cp1, cp2, end, eased_t)

        wobble_amp = math.sin(math.pi * progress) * cfg.mouse_wobble_max
        wx = pt.x + (random.random() - 0.5) * 2 * wobble_amp
        wy = pt.y + (random.random() - 0.5) * 2 * wobble_amp

        raw.move(round(wx), round(wy))

        burst_counter += 1
        if burst_counter >= burst_size and i < steps:
            sleep_ms(rand_range(cfg.mouse_burst_pause))
            burst_counter = 0

    if random.random() < cfg.mouse_overshoot_chance:
        overshoot_dist = rand_range(cfg.mouse_overshoot_px)
        angle = math.atan2(end_y - start_y, end_x - start_x)
        raw.move(round(end_x + math.cos(angle) * overshoot_dist),
                 round(end_y + math.sin(angle) * overshoot_dist))

Four things happen at once, each defeating a different detector. The cubic Bezier with randomized perpendicular control points arcs the path instead of running straight (_random_control_points at mouse.py:44 offsets midpoints by up to 30%). _ease_in_out accelerates from rest and decelerates near the target, while a sinusoidal wobble peaks mid-path and decays at both ends, mimicking the involuntary tremor that grows with cursor speed.

The burst-pause loop is subtler. Real cursor motion arrives in micro-bursts with tiny pauses while the human re-checks aim, so CloakBrowser emits a few points, sleeps a few milliseconds, then continues. A final probabilistic overshoot mimics how people drift past targets and correct back. The whole function compresses four pages of fingerprint-evasion research into 42 lines.

floci-io/floci

⭐ 12.0k · Java

Floci is a free, open-source local AWS emulator. You run docker compose up, point your SDK at localhost, and one container handles 47 AWS services without an auth token or feature gate. The image is ~90 MB, the idle process holds ~13 MiB, and the whole thing ships as a single GraalVM native binary.

The 24ms cold start versus LocalStack’s 3.3 seconds comes from Quarkus native compilation. But GraalVM AOT only compiles what it sees at build time, and the docker client needs Jackson reflecting into a few hundred POJOs at runtime to deserialize daemon responses. Floci’s fix is to list every one of them.

DockerJavaNativeSupport.java:1-40

package io.github.hectorvent.floci.core.common.docker;

import io.quarkus.runtime.annotations.RegisterForReflection;

/**
 * Registers all docker-java classes for GraalVM native image reflection.
 * Jackson needs reflective access to model classes when deserializing Docker API responses.
 */
@RegisterForReflection(classNames = {
    "com.github.dockerjava.api.DockerClient",
    "com.github.dockerjava.api.DockerClientDelegate",
    "com.github.dockerjava.api.async.ResultCallback",
    "com.github.dockerjava.api.async.ResultCallback$Adapter",
    "com.github.dockerjava.api.async.ResultCallbackTemplate",
    "com.github.dockerjava.api.command.AsyncDockerCmd",
    "com.github.dockerjava.api.command.AttachContainerCmd",
    "com.github.dockerjava.api.command.AttachContainerCmd$Exec",
    "com.github.dockerjava.api.command.AuthCmd",
    "com.github.dockerjava.api.command.AuthCmd$Exec",
    "com.github.dockerjava.api.command.BuildImageCmd",
    "com.github.dockerjava.api.command.BuildImageCmd$Exec",
    "com.github.dockerjava.api.command.BuildImageResultCallback",
    "com.github.dockerjava.api.command.CommitCmd",
    "com.github.dockerjava.api.command.CommitCmd$Exec",
    "com.github.dockerjava.api.command.ConnectToNetworkCmd",
    "com.github.dockerjava.api.command.ConnectToNetworkCmd$Exec",
    "com.github.dockerjava.api.command.ContainerDiffCmd",
    "com.github.dockerjava.api.command.ContainerDiffCmd$Exec",
    "com.github.dockerjava.api.command.CopyArchiveFromContainerCmd",
    "com.github.dockerjava.api.command.CopyArchiveFromContainerCmd$Exec",
    "com.github.dockerjava.api.command.CopyArchiveToContainerCmd",
    "com.github.dockerjava.api.command.CopyArchiveToContainerCmd$Exec",
    "com.github.dockerjava.api.command.CopyFileFromContainerCmd",
    "com.github.dockerjava.api.command.CopyFileFromContainerCmd$Exec",
    "com.github.dockerjava.api.command.CreateConfigCmd",
    "com.github.dockerjava.api.command.CreateConfigCmd$Exec",
    "com.github.dockerjava.api.command.CreateConfigResponse",
    "com.github.dockerjava.api.command.CreateContainerCmd",
    "com.github.dockerjava.api.command.CreateContainerCmd$Exec",
    // ... 640 more class names ...
})
public class DockerJavaNativeSupport {}

The file is 683 lines and registers 672 docker-java classes by string name, every command, response POJO, result callback, and $Exec inner class. Without this list, native-image would strip them as dead code and Jackson would throw the moment a daemon response landed. With them pinned, the binary boots without a JVM or classloading, and AWS handlers that spawn containers (RDS launching a real postgres, Lambda starting a runtime image) are ready inside the same 24ms window.

The cost is that every docker-java version bump becomes a chore, since new command classes need new entries. The rest of core/common/docker/ is the ten files that actually use the registered types like ContainerBuilder, ContainerLifecycleManager, and PortAllocator. This one file is just the dictionary the native image consults so the others can run.

withcoral/coral

⭐ 2.3k · Rust

Coral is a local-first SQL runtime that turns APIs, files, and other data sources into queryable tables. Your agent writes one SQL query, and Coral translates it into the underlying API calls or file reads, returning a single result set. You can run it from the CLI yourself or expose it over MCP so agents skip the bespoke tool glue.

Coral’s launch post claims Claude with Coral hits 20% more accuracy, 2x cost efficiency, and 42% lower latency over direct MCP servers. The tagline is “SQL over your tools,” and the trick is doing it all at once. An agent can join GitHub issues against Linear tickets by author email in one query, no MCP round-trips, no client-side merge. Something like this.

SELECT g.title, g.url, l.title AS ticket, l.status
FROM github.issues AS g
JOIN linear.tickets AS l
  ON g.author_email = l.author_email
WHERE g.state = 'open';

register_sources is what makes that legal, hooking every backend into a single DataFusion session under its own schema.

crates/coral-engine/src/runtime/registry.rs:76-119

pub(crate) async fn register_sources(
    ctx: &SessionContext,
    sources: Vec<SourceRegistrationCandidate>,
    source_decorators: &mut [Box<dyn SourceDecorator>],
) -> std::result::Result<SourceRegistrationResult, CoreError> {
    let catalog = ctx.catalog("datafusion").ok_or_else(|| {
        let plan_err = DataFusionError::Plan("catalog 'datafusion' not found".to_string());
        datafusion_to_core(&plan_err, &[])
    })?;

    // ... decorator prep, result/seen-schemas init ...

    for source in sources {
        match source {
            SourceRegistrationCandidate::Compiled(selected_source) => {
                let query_source = &selected_source.source;
                let compiled_source = selected_source.compiled;

                match register_source(ctx, &mut seen_schemas, compiled_source.as_ref()).await {
                    Ok(registration) => {
                        let BackendRegistration {
                            tables,
                            table_functions,
                            source: registered_source,
                        } = registration;
                        let decorated_tables =
                            decorate_source_tables(source_decorators, query_source, tables)?;
                        match catalog.register_schema(
                            compiled_source.schema_name(),
                            Arc::new(StaticSchemaProvider::new(decorated_tables)),
                        ) {
                            Ok(_) => {
                                register_table_functions(ctx, table_functions);
                                result.active_sources.push(registered_source);
                            }

The DataFusion session has one catalog, and each source (HTTP, JSONL, Parquet) becomes a schema inside it through register_schema. After this loop, github.issues and linear.tickets are siblings in the same SQL namespace. The tables are not lazy proxies but real TableProvider instances with declared schemas and pushdown methods, so DataFusion’s planner can choose which side of a join to build versus probe and push WHERE clauses down into each backend. Cross-source joins become one optimized plan instead of several MCP round-trips.

The three backend types live under crates/coral-engine/src/backends/{http,parquet,jsonl}/ and the registry loop above is where they meet. When DataFusion later scans github.issues, the actual network call happens in HttpSourceClient::fetch at backends/http/client.rs:164, which plugs the pushed-down WHERE clauses into the manifest’s request template and fires the request. From the planner’s perspective, some schemas in the catalog just happen to talk to the network.

pranshuparmar/witr

⭐ 17.0k · Go

witr is a single static Go binary that answers one question, why is this thing running? It explains where a process came from, how it was started, and which supervisor in the chain is responsible for keeping it alive.

The author’s Medium post frames witr as the answer to “I ran lsof -i :8080, found a PID, then needed two more commands to figure out what spawned it.” witr is a process explainer, not another ps. The CLI takes a port, a PID, a process name, or a path, and returns a sentence. Four stages run in order. Input resolution turns the raw argument into a PID, ancestry walks the parent chain to init, source detection figures out which supervisor in that chain owns the process, and output rendering composes the explanation. AnalyzePID is the orchestrator that wires the middle three.

internal/pipeline/analyze.go:19-97

func AnalyzePID(cfg AnalyzeConfig) (model.Result, error) {
	ancestry, err := procpkg.ResolveAncestry(cfg.PID)
	if err != nil {
		return model.Result{}, err
	}

	src := source.Detect(ancestry)

	var proc model.Process
	resolvedTarget := "unknown"
	if len(ancestry) > 0 {
		proc = ancestry[len(ancestry)-1]
		resolvedTarget = proc.Command
	}

	// ... verbose-mode enrichment: child PIDs, memory, FDs, threads ...

	res := model.Result{
		Target:          cfg.Target,
		ResolvedTarget:  resolvedTarget,
		Process:         proc,
		RestartCount:    restartCount,
		Ancestry:        ancestry,
		Source:          src,
		Warnings:        source.Warnings(ancestry, src.Type),
		ResourceContext: resCtx,
		FileContext:     fileCtx,
		Children:        childProcesses,
	}

	return res, nil
}

Two calls do the load-bearing work. ResolveAncestry(cfg.PID) walks /proc (or its macOS/Windows equivalents) from PID to root, child-first. source.Detect(ancestry) runs that chain through the ordered detectors, container, SSH, shell, systemd, launchd, supervisor, cron, returning whichever owns the process. Stage one lives one level up in internal/target/resolve.go, where Resolve turns a PID, Port, Name, or File target into PIDs that AnalyzePID fans out over, which is why witr 8080, witr nginx, and witr 14233 all reach the same pipeline.

Stage four is the returned model.Result, carrying the target name, source verdict, warnings, optional resource and file context, and the process tree. A separate internal/output/ package renders it as text, JSON, or a tree. Noisy details stay inside the stages, and the struct becomes the contract between “find out why” and “tell the user why.”

Code Pointer

Discussion about this post

Ready for more?