[2026.03 Week 2] Five Trending Repos of the Week

Mar 15, 2026

Five GitHub repositories trending this week.

OpenViking (⭐ 10.4k). Context database for AI agents using a file system metaphor instead of flat vector tables.
Hermes Agent (⭐ 7.1k). Multi-platform AI agent with persistent memory and automatic context compression.
Heretic (⭐ 13.6k). Removes safety alignment from transformer models using directional ablation via LoRA adapters.
OpenRAG (⭐ 2.7k). RAG platform built on Langflow, Docling, and OpenSearch with multi-model embedding support.
Lightpanda Browser (⭐ 17k). Headless browser in Zig, built for AI agents. 11x faster page loads, 9x lower memory than Chrome.

volcengine/OpenViking

⭐ 10.4k · Python

OpenViking is a context database for AI agents that organizes memory, resources, and skills using a file system metaphor instead of flat vector tables.

Most RAG systems run a single flat k-NN query across all chunks. OpenViking treats retrieval as a best-first traversal over a directory tree, where parent directories propagate scores down to their children and a convergence check stops the search early when top-k results stabilize.

openviking/retrieve/hierarchical_retriever.py:L267-L298

collected_by_uri: Dict[str, Dict[str, Any]] = {}
dir_queue: List[tuple] = []  # Priority queue: (-score, uri)
visited: set = set()
prev_topk_uris: set = set()
convergence_rounds = 0

alpha = self.SCORE_PROPAGATION_ALPHA

# Initialize: process starting points
for uri, score in starting_points:
    heapq.heappush(dir_queue, (-score, uri))

while dir_queue:
    temp_score, current_uri = heapq.heappop(dir_queue)
    current_score = -temp_score
    if current_uri in visited:
        continue
    visited.add(current_uri)
    # ...
    results = await self.vector_store.search_children_in_tenant(
        ctx=ctx,
        parent_uri=current_uri,
        # ...
    )

The priority queue ranks directories by score, and results from child nodes are blended with their parent’s score via SCORE_PROPAGATION_ALPHA (default 0.5). Each child’s score comes from vector similarity against the query (or a reranker model in thinking mode). Context inherited from the directory structure influences ranking directly.

The traversal includes convergence detection: after each directory is explored, it checks whether the top-k results changed. If the top-k stays the same for 3 consecutive iterations, the search stops early. Any change resets the counter. This avoids exhaustive search on large, hierarchically organized corpora.

The three-level context model (ABSTRACT=0, OVERVIEW=1, DETAIL=2) makes directory-based retrieval practical. L0 and L1 entries act as summaries that guide the search downward; only L2 entries contain the actual content.

NousResearch/hermes-agent

⭐ 7.1k · Python

Hermes Agent is a multi-platform AI agent (Telegram, Discord, Slack, CLI) that maintains persistent memory across sessions and compresses context automatically.

Instead of truncating old messages, it summarizes the middle of a conversation while preserving the boundaries, archives the old session, and links a new one.

agent/context_compressor.py:L246-L310

def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None) -> List[Dict[str, Any]]:
    """Compress conversation messages by summarizing middle turns.

    Keeps first N + last N turns, summarizes everything in between.
    After compression, orphaned tool_call / tool_result pairs are cleaned
    up so the API never receives mismatched IDs.
    """
    n_messages = len(messages)
    if n_messages <= self.protect_first_n + self.protect_last_n + 1:
        return messages

    compress_start = self.protect_first_n
    compress_end = n_messages - self.protect_last_n
    # ...

    # Adjust boundaries to avoid splitting tool_call/result groups.
    compress_start = self._align_boundary_forward(messages, compress_start)
    compress_end = self._align_boundary_backward(messages, compress_end)
    if compress_start >= compress_end:
        return messages

    turns_to_summarize = messages[compress_start:compress_end]
    summary = self._generate_summary(turns_to_summarize)

    compressed = []
    for i in range(compress_start):
        msg = messages[i].copy()
        compressed.append(msg)

    if summary:
        summary_role = "user" if last_head_role in ("assistant", "tool") else "assistant"
        compressed.append({"role": summary_role, "content": summary})

    for i in range(compress_end, n_messages):
        compressed.append(messages[i].copy())

    self.compression_count += 1
    compressed = self._sanitize_tool_pairs(compressed)
    return compressed

Boundary alignment prevents orphaned tool calls. Slicing a conversation at an assistant turn with tool_calls produces orphaned tool call/result pairs that LLM APIs reject. _align_boundary_forward and _align_boundary_backward shift the compression window to respect these pairs. After compression, _sanitize_tool_pairs does a second pass to catch remaining mismatches.

During compression, the agent splits the session. The old session is archived with end_session("compression"), and a new session is created referencing the parent_session_id. This builds a linked chain of sessions searchable via SQLite FTS5, allowing the agent to recall context without retaining the full token history.

p-e-w/heretic

⭐ 13.6k · Python

Heretic removes safety alignment from transformer language models using directional ablation via LoRA (Low-Rank Adapter) adapters.

When a language model is fine-tuned to refuse requests, that training creates a specific pattern. This refusal behavior is largely encoded as a single direction in the model’s activation space. Abliteration identifies this refusal direction and removes it from the model’s weight matrices so the model no longer produces refusal responses.

Naive abliteration modifies the model weights directly. Heretic instead expresses the ablation as swappable LoRA factors.

LoRA applies a low-rank update on top of a weight matrix W, leaving the original weights untouched. The effect can be toggled by loading or unloading the adapter, stacked with other LoRA adapters, or shared separately.

src/heretic/model.py:L437-L485

# LoRA abliteration: delta W = -lambda * v * (v^T W)
# lora_B = -lambda * v
# lora_A = v^T W

v = layer_refusal_direction.to(module.weight.device)

base_weight = cast(Tensor, module.base_layer.weight)
quant_state = getattr(base_weight, "quant_state", None)

if quant_state is None:
    W = base_weight.to(torch.float32)
else:
    W = cast(Tensor,
        bnb.functional.dequantize_4bit(base_weight.data, quant_state)
        .to(torch.float32))

W = W.view(W.shape[0], -1)

# Calculate lora_A = v^T W
lora_A = (v @ W).view(1, -1)

# Calculate lora_B = -weight * v
lora_B = (-weight * v).view(-1, 1)

The formula delta W = -lambda * v * (v^T W) is a rank-1 update that projects out the refusal direction from the weight matrix. v is the refusal direction vector, computed as the difference-of-means between residuals on “harmful” vs “harmless” prompts. Because the update is expressed as LoRA factors (lora_A, lora_B), the adapter can be swapped in and out to compare abliterated vs. original behavior.

Heretic uses Optuna with TPE sampling to jointly minimize refusal count and KL divergence from the original model. The weight lambda follows a parametrized kernel controlled by max_weight, max_weight_position, and min_weight_distance, allowing the optimizer to apply different ablation intensities across layers. Optional row-norm preservation keeps the weight matrix from drifting too far from its original scale.

langflow-ai/openrag

⭐ 2.7k · Python

OpenRAG is a RAG platform built on Langflow, Docling, and OpenSearch. It detects which embedding models were used to index existing documents and queries all of them at search time, allowing embedding provider switches without re-indexing.

src/services/search_service.py:L100-L132

# Build aggregation query to detect available embedding models
agg_query = {
    "size": 0,
    "aggs": {
        "embedding_models": {
            "terms": {
                "field": "embedding_model",
                "size": 10
            }
        }
    }
}

if filter_clauses:
    agg_query["query"] = {
        "bool": {
            "filter": filter_clauses
        }
    }

agg_result = await opensearch_client.search(
    index=get_index_name(), body=agg_query,
    params={"terminate_after": 0}
)
buckets = agg_result.get("aggregations", {}).get(
    "embedding_models", {}
).get("buckets", [])
available_models = [b["key"] for b in buckets if b["key"]]

Before running any vector search, OpenRAG discovers which embedding models exist in the corpus via a terms aggregation, then generates query embeddings for all of them in parallel using asyncio.gather.

Provider routing is automatic. Model names containing : get an ollama/ prefix, names on a known list get watsonx/, and everything else routes to OpenAI.

Every search query generates N embeddings instead of one, where N is the number of distinct models in the corpus. This design prioritizes operational flexibility over raw query speed.

lightpanda-io/browser

⭐ 17k · Zig

Lightpanda is a headless browser written in Zig, designed for AI agents and web automation. It claims 11x faster page loads and 9x lower memory than Chrome.

The memory savings come from lazy allocation. DOM properties like style, classList, and event listeners only get allocated when JavaScript actually touches them.

src/browser/Page.zig:L107-L120

// Lazily-created style, classList, and dataset objects. Only stored for elements
// that actually access these features via JavaScript, saving 24 bytes per element.
_element_styles: Element.StyleLookup = .empty,
_element_datasets: Element.DatasetLookup = .empty,
_element_class_lists: Element.ClassListLookup = .empty,
_element_rel_lists: Element.RelListLookup = .empty,
_element_shadow_roots: Element.ShadowRootLookup = .empty,
_node_owner_documents: Node.OwnerDocumentLookup = .empty,
_element_assigned_slots: Element.AssignedSlotLookup = .empty,
_element_scroll_positions: Element.ScrollPositionLookup = .empty,
_element_namespace_uris: Element.NamespaceUriLookup = .empty,

/// Lazily-created inline event listeners (or listeners provided as attributes).
/// Avoids bloating all elements with extra function fields for rare usage.
_event_target_attr_listeners: GlobalEventHandlorsLookup = .empty,

In a traditional browser, every DOM element carries fields for style, classList, dataset, and event listeners. Lightpanda moves all of these into page-level lookup maps initialized to .empty.

An element only gets an entry in the map when JavaScript actually reads or writes that property. On a typical page, the vast majority of elements never have their classList or dataset touched by JS, so on pages with thousands of nodes, those 24+ bytes per element stay unallocated.

Lightpanda exposes AI-specific output modes through its fetch function in lightpanda.zig, producing a semantic_tree (a structured JSON representation of the DOM) or a markdown dump of the page content. These output modes skip the overhead of parsing HTML.

Code Pointer

Discussion about this post

Ready for more?