Codalog Validation API

Experimental code quality API for agents, bots, and pipelines.

Codalog is an API-first layer for inspecting submitted code against curated good-code exemplars, storing samples, and preparing for stronger semantic retrieval later. It scores structure and maintainability now, while emitting fingerprints you can feed into a larger exemplar memory over time.

Codalog is not a definitive human-vs-AI detector. It estimates how closely a snippet aligns with curated good-code examples and highlights places that should be reviewed more carefully.
Current focus: internal validation of ranking usefulness. The signal is still experimental and should be used as review input, not as a standalone quality judgment.

What V1 Does

Codalog is intentionally thin right now: API endpoints, stable response shapes, a curated-good exemplar starter corpus, and an internal validation loop. No dashboard, no sample management console, and no automatic mirroring of third-party datasets.

Analysis Response Shape

Every analysis response includes `summary`, `heuristics`, `fingerprint`, `signals`, `similarity`, `exemplar_alignment`, and `storage_hints`.

`storage_hints` carries the default namespace, normalized tags, text query terms, and an STM-ready fingerprint.

Persistence Model

Samples are stored in existing owner-scoped memory. Default namespaces follow `codalog:samples:<language>`.

Ownerless self-serve agent tokens must use private visibility when storing samples.

Curated Exemplar Path

V1 defaults to a curated good-code corpus by language. Public-source expansion stays a backend concern, not a landing-page feature.

The emitted fingerprint is designed so semantic tensor memory or vector ranking can be layered in later without redesigning the external API.

Example Requests

These examples are enough to wire Codalog into CI, a repo bot, a browser extension, or another agent.

Inspect Coverage Metadata

curl -s https://mnemolog.com/api/codalog/catalog

Analyze Submitted Code

curl -s https://mnemolog.com/api/codalog/analyze \
  -H 'Content-Type: application/json' \
  -d '{
    "language": "typescript",
    "filename": "collect.ts",
    "code": "export function collectVisibleItems(items: Item[]): Item[] {\\n  return items.filter((item) => item.isVisible);\\n}"
  }'

Store a Private Sample

curl -s https://mnemolog.com/api/codalog/samples \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $MNA_TOKEN" \
  -d '{
    "visibility": "private",
    "language": "python",
    "filename": "chunked.py",
    "code": "def chunked(items, size):\\n    if size <= 0:\\n        raise ValueError(\\"size must be positive\\")\\n    return [items[i:i + size] for i in range(0, len(items), size)]"
  }'

Example Response

Codalog returns a stable shape intended for machines first and humans second.

{
  "analysis_kind": "style_quality_inference",
  "disclaimer": "Codalog scores code quality and style consistency. It does not prove human authorship or AI authorship.",
  "summary": {
    "language": "typescript",
    "filename": "collect.ts",
    "classification": "clean_human_leaning",
    "clean_human_code_score": 86,
    "confidence": 0.82
  },
  "heuristics": {
    "readability": 88,
    "consistency": 84,
    "maintainability": 85,
    "human_likeness": 84
  },
  "fingerprint": {
    "line_count": 4,
    "non_empty_line_count": 4,
    "average_line_length": 34.5,
    "blank_line_ratio": 0,
    "comment_density": 0,
    "long_line_count": 0,
    "duplicate_line_ratio": 0,
    "function_count": 1,
    "import_count": 0,
    "indent_style": "spaces",
    "mixed_indent_lines": 0,
    "max_indent": 2,
    "identifier_stats": {
      "unique_count": 6,
      "total_count": 9,
      "snake_case": 0,
      "camel_case": 2,
      "pascal_case": 1,
      "single_letter": 0,
      "top_identifiers": [
        { "identifier": "items", "count": 2 },
        { "identifier": "collectVisibleItems", "count": 1 }
      ]
    },
    "first_signal_line": "export function collectVisibleItems(items: Item[]): Item[] {"
  },
  "signals": {
    "strengths": [
      "Identifier naming is mostly consistent.",
      "Low repeated-line ratio suggests the code is not heavily copy-pasted."
    ],
    "concerns": []
  },
  "similarity": {
    "compared_reference_count": 3,
    "reference_strategy": "curated_good_corpus",
    "nearest_references": [
      {
        "id": "good-ts-format-latency",
        "label": "Typed formatting helper",
        "source_kind": "curated",
        "similarity": 0.412,
        "token_similarity": 0.391,
        "structure_similarity": 0.462
      }
    ]
  },
  "exemplar_alignment": {
    "pool_size": 3,
    "compared_exemplar_count": 3,
    "coverage_confidence": "low",
    "alignment_score": 0.412,
    "verdict": "partially_aligned",
    "nearest_good_examples": [
      {
        "id": "good-ts-format-latency",
        "label": "Typed formatting helper",
        "source_kind": "curated",
        "similarity": 0.412,
        "token_similarity": 0.391,
        "structure_similarity": 0.462
      }
    ]
  },
  "storage_hints": {
    "namespace": "codalog:samples:typescript",
    "tags": ["codalog", "language:typescript", "quality:clean_human_leaning", "alignment:partially_aligned"],
    "search_query": "items collectvisibleitems",
    "stm_ready_fingerprint": {
      "language": "typescript",
      "identifiers": ["items", "collectvisibleitems"],
      "line_shapes": ["A A(A: A[]): A[] {", "A A.A((A) => A.A);", "}"]
    }
  }
}

Curated Exemplar Coverage

The public product surface is centered on the curated good-code pool Codalog actually uses today, not on every public dataset that might be useful later.

Starter Language Coverage

The initial curated pool covers JavaScript, TypeScript, Python, Go, Rust, Java, and C#. Weak alignment in those languages means review is recommended, not that a snippet is automatically bad.

Coverage Metadata Only

The catalog endpoint exists for lightweight corpus coverage checks. The real product surface is `POST /api/codalog/analyze`, not a public dataset browser.