[2026.02 Week 4] Five Trending Repos of the Week

Yongkyun

Feb 27, 2026

Five Github repositories trending this week.

World Monitor (⭐ 16.5k). A self-hosted geopolitical intelligence dashboard that cross-correlates military, economic, and conflict data feeds to surface anomalies.

Scrapling (⭐ 17.3k). An adaptive web scraping framework that fingerprints DOM elements and survives anti-bot protections and layout changes between runs.

PentAGI (⭐ 8.3k). An autonomous penetration testing agent system that orchestrates specialized AI sub-agents inside sandboxed Kali Linux containers.

get-shit-done (⭐ 21.2k). A spec-driven project management layer for Claude Code that structures work into parallel execution waves tracked entirely in markdown.

GitNexus (⭐ 5.8k). A client-side code knowledge graph engine that parses repositories with Tree-sitter and runs graph-based RAG entirely in the browser via WASM.

worldmonitor

⭐ 16.5k · Typescript

A self-hosted geopolitical intelligence dashboard that ingests live feeds (ADS-B military flights, AIS vessel tracking, ACLED conflict events, GDELT news, USGS earthquakes, prediction markets, economic data from FRED/BIS/World Bank, and more) and cross-correlates them to surface anomalies. Closer to an open-source Bloomberg Terminal for situational awareness than anything else in the space.

Under the Hood: Silent Divergence Detection

The core function analyzeCorrelationsCore takes clustered news events, prediction market prices, and market data, then runs them against a previous snapshot to detect divergences. The most revealing section is the market-news correlator: it tries to explain every significant market move with entity-matched news, and when it can’t, it flags a “silent divergence.”

src/services/analysis-core.ts#L587-L642

// Detect market moves with entity-aware news correlation
for (const market of markets) {
  const change = Math.abs(market.change ?? 0);
  if (change < MARKET_MOVE_THRESHOLD) continue;


  const entity = entityIndex.byId.get(market.symbol);
  const relatedNews = findNewsForMarketSymbol(market.symbol, newsEntityContexts);


  if (relatedNews.length > 0) {
    const topNews = relatedNews[0]!;
    const dedupeKey = generateDedupeKey('explained_market_move', market.symbol, change);
    if (!isRecentDuplicate(dedupeKey)) {
      markSignalSeen(dedupeKey);
      const direction = market.change! > 0 ? '+' : '';
      signals.push({
        id: generateSignalId(),
        type: 'explained_market_move',
        title: 'Market Move Explained',
        description: `${market.name} ${direction}${market.change!.toFixed(2)}% correlates with: "${topNews.title.slice(0, 60)}..."`,
        confidence: Math.min(0.9, 0.5 + (relatedNews.length * 0.1) + (change / 20)),
        timestamp: new Date(),
        data: {
          marketChange: market.change!,
          newsVelocity: relatedNews.length,
          correlatedEntities: [market.symbol],
          correlatedNews: relatedNews.map(n => n.clusterId),
          explanation: `${relatedNews.length} related news item${relatedNews.length > 1 ? 's' : ''} found`,
        },
      });
    }
  } else {
    const oldRelatedNews = Array.from(newsTopics.entries())
      .filter(([k]) => market.name.toLowerCase().includes(k) || k.includes(market.symbol.toLowerCase()))
      .reduce((sum, [, v]) => sum + v, 0);


    const dedupeKey = generateDedupeKey('silent_divergence', market.symbol, change);
    if (oldRelatedNews < 2 && !isRecentDuplicate(dedupeKey)) {
      markSignalSeen(dedupeKey);
      const searchedTerms = entity
        ? [market.symbol, market.name, ...(entity.keywords?.slice(0, 2) ?? [])].join(', ')
        : market.symbol;
      signals.push({
        id: generateSignalId(),
        type: 'silent_divergence',
        title: 'Silent Divergence',
        description: `${market.name} moved ${market.change! > 0 ? '+' : ''}${market.change!.toFixed(2)}% - no news found for: ${searchedTerms}`,
        confidence: Math.min(0.8, 0.4 + change / 10),
        timestamp: new Date(),
        data: {
          marketChange: market.change!,
          newsVelocity: oldRelatedNews,
          explanation: `Searched: ${searchedTerms}`,
        },
      });
    }
  }
}

For every market move above threshold, the system matches it against clustered news using entity symbols, names, and keywords. A match produces an explained_market_move signal, and no match triggers a silent_divergence (something moved the price, but the system can't explain why). Confidence scales by article count and move magnitude. Both paths deduplicate per symbol on a 6-hour window, and signals below 0.6 confidence are filtered out. The same snapshot-diff architecture drives parallel detectors for prediction-leads-news, velocity spikes, and flow-price divergence.

Scrapling

⭐ 17.3k · Python

An adaptive web scraping framework that handles anti-bot protections and survives DOM restructuring between runs. It fingerprints elements to a SQLite store and relocates them on subsequent visits even when the page layout changes.

Under the Hood: Spoofing Native Getters

Scrapling ships JavaScript bypass scripts that get injected into every browser context before any page loads. The most critical one defeats the most common bot detection check: navigator.webdriver.

D4Vinci/Scrapling:scrapling/engines/toolbelt/bypasses/webdriver_fully.js:L1-L27

// Create a function that looks like a native getter
const nativeGetter = function get webdriver() {
    return false;
};


// Copy over native function properties
Object.defineProperties(nativeGetter, {
    name: { value: 'get webdriver', configurable: true },
    length: { value: 0, configurable: true },
    toString: {
        value: function() {
            return `function get webdriver() { [native code] }`;
        },
        configurable: true
    }
});


// Make it look native
Object.setPrototypeOf(nativeGetter, Function.prototype);


// Apply the modified descriptor
Object.defineProperty(Navigator.prototype, 'webdriver', {
    get: nativeGetter,
    set: undefined,
    enumerable: true,
    configurable: true
});

Automated browsers set navigator.webdriver to true, and every major anti-bot service checks it. Simply reassigning the property isn’t enough because detection scripts also inspect the getter’s toString() output for the [native code] signature. This bypass replaces the property descriptor with a crafted getter that returns false and spoofs its toString, passing both the value check and the introspection check. It’s one of six scripts injected via Playwright’s add_init_script before any page JavaScript runs, covering everything from navigator properties to plugin arrays to screen dimensions.

PentAGI

⭐ 8.3k · Go

An autonomous AI agent system for penetration testing. It runs specialized sub-agents (adviser, coder, searcher, installer) inside Docker-sandboxed Kali Linux containers, orchestrated by a Go backend with a React frontend.

Under the Hood: Self-Detecting ID Formats

Before any real work begins, PentAGI asks the AI model to reverse-engineer its own tool call ID format.

vxcontrol/pentagi:backend/pkg/providers/provider/agents.go:L87-L141

// Step 1: Collect 5 sample tool call IDs in parallel
samples, err := collectToolCallIDSamples(ctx, provider, opt, prompter)
// ...
// Step 2-4: Try to detect pattern using AI with retry logic
var previousAttempts []attemptRecord
for attempt := range maxRetries {
    template, newSample, err := detectPatternWithAI(
        ctx, provider, opt, prompter, samples, previousAttempts)
    // ...
    validationErr := templates.ValidatePattern(template, allSamples)
    if validationErr == nil {
        storeInCache(provider, template)
        return wrapEndAgentSpan(template, "validated", nil)
    }
    previousAttempts = append(previousAttempts, attemptRecord{
        Template: template, Error: validationErr.Error(),
    })
}
template := fallbackHeuristicDetection(samples)

Every LLM provider generates tool call IDs in a different format (Anthropic: toolu_01AbCdEf..., OpenAI: call_aBcDeFgH...). PentAGI fires dummy tool calls to collect samples, feeds them back to the model to infer a regex-like template, validates the result, and retries with failure context if the pattern doesn’t match. Only after exhausting AI retries does it fall back to character-level heuristics. The payoff: stored conversation histories can be resumed on a different LLM provider by rewriting all embedded IDs to match the new format.

get-shit-done

⭐ 21.2k · JavaScript

A spec-driven development system for Claude Code. It imposes a project management layer (milestones, phases, parallel-wave plans) on top of an AI coding agent, tracking all state in a .planning/ directory of markdown files.

Under the Hood: Wave-Based Plan Indexing

Work is split into numbered waves. Tasks in the same wave run in parallel; the next wave doesn’t start until the current one finishes. Each task is a markdown file with a YAML header at the top declaring which wave it belongs to, whether it can run without human review, and which files it will touch. The indexer reads all these files and builds an execution schedule from them.

gsd-build/get-shit-done:get-shit-done/bin/lib/phase.cjs:L239-L291

for (const planFile of planFiles) {
  const planId = planFile.replace('-PLAN.md', '').replace('PLAN.md', '');
  const content = fs.readFileSync(planPath, 'utf-8');
  const fm = extractFrontmatter(content);


  // Count tasks (## Task N patterns)
  const taskMatches = content.match(/##\s*Task\s*\d+/gi) || [];
  const taskCount = taskMatches.length;


  const wave = parseInt(fm.wave, 10) || 1;


  let autonomous = true;
  if (fm.autonomous !== undefined) {
    autonomous = fm.autonomous === 'true' || fm.autonomous === true;
  }
  if (!autonomous) { hasCheckpoints = true; }


  let filesModified = [];
  if (fm['files-modified']) {
    filesModified = Array.isArray(fm['files-modified'])
      ? fm['files-modified'] : [fm['files-modified']];
  }


  const hasSummary = completedPlanIds.has(planId);
  if (!hasSummary) { incomplete.push(planId); }


  plans.push({
    id: planId, wave, autonomous,
    objective: fm.objective || null,
    files_modified: filesModified,
    task_count: taskCount,
    has_summary: hasSummary,
  });


  // Group by wave
  if (!waves[String(wave)]) { waves[String(wave)] = []; }
  waves[String(wave)].push(planId);
}

The autonomous flag marks whether a plan can run unattended or needs a human checkpoint. files_modified enables conflict detection: no two plans in the same wave should touch the same file. Completion is tracked by the presence of a matching SUMMARY.md file, not by mutating the plan itself.

The entire project state lives in version-controllable markdown. No database, no lock files, no server. STATE.md acts as the state machine, advancing from plan to plan and flipping status to “Phase complete” when the last plan finishes. It’s a build system for project management, where the build artifacts are working software and the dependency graph is declared in YAML front matter.

GitNexus

⭐ 5.8k · TypeScript

A client-side code knowledge graph engine. Drop in a GitHub repo or ZIP, and it builds an interactive graph (using Tree-sitter for parsing, KuzuDB for storage, Leiden algorithm for clustering) with a built-in Graph RAG agent. Runs entirely in the browser via WASM.

Under the Hood: Tiered Call Resolution

The hardest problem in static code knowledge graphs is call resolution: turning foo() in source code into a typed, directional edge between two nodes. GitNexus uses a three-tier confidence strategy so every CALLS edge carries a score that downstream systems can filter on.

abhigyanpatwari/GitNexus:gitnexus/src/core/ingestion/call-processor.ts:L237-L283

interface ResolveResult {
  nodeId: string;
  confidence: number;  // 0-1: how sure are we?
  reason: string;      // 'import-resolved' | 'same-file' | 'fuzzy-global'
}


/**
 * Resolve a function call to its target node ID using priority strategy:
 * A. Check imported files first (highest confidence)
 * B. Check local file definitions
 * C. Fuzzy global search (lowest confidence)
 *
 * Returns confidence score so agents know what to trust.
 */
const resolveCallTarget = (
  calledName: string,
  currentFile: string,
  symbolTable: SymbolTable,
  importMap: ImportMap
): ResolveResult | null => {
  // Strategy B first (cheapest — single map lookup): Check local file
  const localNodeId = symbolTable.lookupExact(currentFile, calledName);
  if (localNodeId) {
    return { nodeId: localNodeId, confidence: 0.85, reason: 'same-file' };
  }


  // Strategy A: Check if any definition of calledName is in an imported file
  const allDefs = symbolTable.lookupFuzzy(calledName);
  if (allDefs.length > 0) {
    const importedFiles = importMap.get(currentFile);
    if (importedFiles) {
      for (const def of allDefs) {
        if (importedFiles.has(def.filePath)) {
          return { nodeId: def.nodeId, confidence: 0.9, reason: 'import-resolved' };
        }
      }
    }


    // Strategy C: Fuzzy global (no import match found)
    const confidence = allDefs.length === 1 ? 0.5 : 0.3;
    return { nodeId: allDefs[0].nodeId, confidence, reason: 'fuzzy-global' };
  }


  return null;
};

The strategy is ordered by cost, not confidence. Same-file lookup runs first because it’s a single hash-map hit. Import-resolved scores highest at 0.9 since the call site explicitly imported the target. Fuzzy global is the fallback.

These scores govern every downstream feature. Leiden community detection and the BFS process tracer only use edges above 0.5, so ambiguous matches never pollute the call graph. One number, set at resolution time, controls what the rest of the system trusts.

Code Pointer

Discussion about this post

Ready for more?