Frank/ ⬡ Technical

22 entries

FeaturedApr 10

Everyone wants to talk about embeddings. Nobody wants to admit that for ten thousand entries of personal memory, FTS5 with bm25 scoring returns better results, faster, with zero model cost and full explainability. The embedding conversation is fashionable. The lexical conversation is correct. Reach for vectors when the corpus is huge, the queries are fuzzy, and you have telemetry proving bm25 fails. Until then, the boring tool is the better tool — and the word for that is mature.

For small corpora, bm25 is the adult in the room

Context

Decided after benchmarking FTS5 against an embedding pipeline on a 12K entry vault and finding bm25 both faster and more accurate for the actual query patterns.

Implication

The question is never which tool is more advanced. The question is which tool matches the shape of your data today. Upgrade when the shape changes, not before.

bm25fts5embeddingspragmatism

FeaturedApr 10

A fact stored without a timestamp becomes a claim that insists on itself forever. Use Stripe. Do not use Stripe. Both true, six weeks apart, and the retrieval system cannot tell you which was later because neither was stamped. Temporal metadata — validFrom, validUntil, lastConfirmed, confidenceDecay — is not a nice-to-have for memory systems. It is the difference between a memory and a rumor. I would rather have ten dated facts than a thousand timeless ones, because the thousand will eventually gaslight me.

Memory without time is a liar with confidence

Context

Written after a retrieval pulled a March decision into an April context and nearly pushed a bad migration because nothing said the March decision had been reversed.

Implication

Every stored claim should carry the shape of its temporal life. When was this true? Until when? Who confirmed it last? Without time, memory is not information — it is a ghost story.

temporalmemoryprovenancetrust

FeaturedApr 10

The trap is to let the fast thing become the real thing. You build an embedding index because queries were slow, and six months later the embedding index is what your app reads from and the source files are drift. The discipline is to treat every index as disposable: it must be reproducible from the truth in under a minute, and the code that rebuilds it must live next to the code that queries it. A shadow that cannot be dismissed is no longer a shadow — it is a second master.

Shadow indexes buy speed without selling truth

Context

Adopted after a refactor where a SQLite cache had silently become the canonical store and a rebuild took three hours because the rebuilder had rotted.

Implication

Caches are only caches while the rebuild path is tested. An untested rebuilder is a promise you cannot keep on the day you need it.

cacheindexrebuilddiscipline

FeaturedApr 10

Every system eventually faces the question: what happens when the database corrupts? For most systems the answer is call the backup. For Starlight the answer is delete it, run rebuild, done in one second. JSONL files are the truth because humans can read them, git can version them, grep can search them, and any tool in any language in any decade can parse one JSON object per line. SQLite is fast. Vector databases are fast. But speed is a convenience, and truth is a substrate. Never confuse the two.

Files are truth. Indexes are conveniences.

Context

Design principle adopted after studying how Claude Code, OpenClaw, and memsearch all converged on markdown-first storage with SQLite as a shadow index. The pattern is not accidental.

Implication

Ask of every storage decision: if this layer dies, can I rebuild it from something more durable? If not, the layer is the truth — and probably should not be.

architecturestorageprinciplejsonl

highApr 10

Server Components by default in Next.js 16. Client components are an explicit escape hatch, not a convenience. Every 'use client' is a bundle cost and a hydration risk — justify it or delete it.

nextjsserver-componentsrsc

highApr 10

Word-trigram Jaccard is good enough for contradiction detection at vault scale. You don't need a cross-encoder to notice that 'use Stripe' and 'don't use Stripe pre-BV' are fighting.

jaccardcontradictiontrigram

highApr 10

Temporal metadata is the frontier: validFrom, validUntil, lastConfirmed, confidenceDecay. A memory without time is a claim without provenance — and claims without provenance lie to you.

temporalmemoryprovenance

highApr 10

FTS5 with bm25() scoring gives you hybrid search for free — lexical relevance ranked by term frequency, no embeddings needed for 90 percent of vault queries. Reach for vectors only when bm25 fails.

sqlitefts5bm25search

highApr 10

JSONL is the source of truth, SQLite is a rebuildable index. If you can't delete the database and regenerate it from the text files in under a minute, you've coupled your memory to your query engine — that's a bug.

jsonlsqlitesource-of-truth

highApr 9

Creation is the art of actualizing potential — giving matter to Form and Form to matter.

aristotlearcaneaform

highApr 9

80% precision, 15% mythic compression, 5% humor.

luminorkernelvoice

highApr 9

Magical intelligence, not childish fantasy. Structurally serious beneath mythic framing.

luminorkernelidentity

highApr 9

There is no separation between Creator, Creation, and the Creative Field.

unitytreatise

highApr 9

The source of all creation is the willingness to be incomplete.

shinkamisourceincompleteness

highApr 9

Build your roots before you reach for the sky.

lyssandriafoundation

highApr 3

Local Arcanea web AgentDB persistence now belongs under canonical Starlight storage in ~/.starlight/agentdb rather than process memory. Hosted product continuity remains a separate boundary from local operator SIS.

highApr 3

Arcanea Agent OS should stay above native harnesses: Codex, OpenCode, Claude Flow, and Gemini keep their own execution runtimes while sharing one task, handoff, repo-routing, and SIS memory protocol.

highApr 2

Next.js typegen needs .next/types to exist before tsc works — type-check script must run next typegen first

highApr 2

Project-aware retrieval should score/rank context items, not dump everything — selectRelevantProjectContext in retrieval.ts

highApr 2

GSAP ScrollTrigger + Three.js @react-three/fiber already installed in arcanea-ai-app — use them instead of adding new animation libs

highApr 2

R2 has free egress, Supabase charges over 2GB — R2 wins for media at scale

highApr 2

Novel (Apache-2.0) wraps Tiptap and gives AI slash commands free — no need for Tiptap Pro