2026-05-25
i did not touch any generative ai or agentic llm tools prior to february 2026. i had tried them many times over the years, and i always kept running into the same frustrations. this vehemently fueled my contrarian attitudes towards llms. while i still largely hold these attitudes, it is increasingly more difficult to deny the utility of these tools.
all written opinions are my own.
for the past few weeks, i have been doing a little r&d on a maths harness that uses claude code.
its maiden voyage was in a review of Go's new floating point parsing and printing algorithm where i tested its ability to function as an aide when reading reading Russ' proofs. in doing so, i uncovered a material edge case that diverged from the proof.
over the weekend, a friend showed me a tweet about AlphaProof's recent publications. my brain lit up; i asked my friend to just show me one of the titles. this was a great opportunity to benchmark my little project with only one clear bias: i knew that whatever it was, it was almost certainly provable.
one program i have been tinkering on recently is a computational framework in Go that allows an agent to guess at how difficult a proof may be. the classifications are ternary:
the only problem i had formally pointed it at was an erdos problem that was deemed 'hard' by the framework; after running it on a051293, it deemed the problem 'elementary,' which is a bit funny.
i do not have the ability to concisely and neatly present the topics in my own voice; therefore, i have marked the sections [gen]erated by claude from the lean proof.
if i have time to review and understand these topics
more deeply, i will revise this in a follow up. my
understanding is that the premise of the proof is
expressed correctly, and the lack of any axiom or
sorry in the lean proof indicate that it is
computationally verified.
my process was largely hands-off; i spent a majority of active time thinking about the meta strategy and documenting frictions the harness experienced and was incapable of self-documenting. i did interject a few key details that are otherwise not noteworthy when compared to how claude materialized them.
in 2002, Cloitre conjectured that the number of nonempty
subsets of {1,...,n} with integer mean satisfies
for all M, where F(k) are exactly the ordered
Bell Numbers:
the proof decomposes into three independent modules sketched below for brevity.
the first task is counting subsets of {1,...,k} whose
elements sum to . this is a discrete fourier
transform on cyclic groups.
let be a primitive k-th root of unity. character orthogonality gives
applying this to indicator-sum over all subsets :
b(k):
zeta = primitive k-th root of unity
total = 0
for r in {0, ..., k-1}:
total += product over m in {1,...,k} of (1 + zeta^(r*m))
return total / k
the product expands
via Finset.prod_add into exactly
.
averaging over r applies the orthogonality filter,
extracting the count of subsets with
.
for a given r, the values for cycle through the q-th roots of unity with multiplicity , where . the product reduces to
where is a primitive q-th root of unity. evaluating at gives . factoring out :
for each divisor of , exactly values of satisfy . the corresponding contributes when is odd (i.e., is an odd divisor of ) and 0 otherwise. summing:
this is stanley's exercise 1.105. the lean proof uses
IsPrimitiveRoot, Polynomial.nthRootsFinset, and
Nat.totient.
this module proves the per-k identity connecting two counts:
the claim is . this was observed without proof by Zumkeller (2006, oeis a082550) and Wiseman (2019, oeis a063776). as of may 2026, no proof exists in the published literature. to the best of my knowledge, the proof below is the first.
work in (representing shifted by 1). for a nonempty subset , define
where .
both and count solutions to a linear congruence of the form over . the solution count is:
count_solutions(b, a, m):
g = gcd(a, m)
if g divides b: return g
else: return 0
has ; has . since and the divisibility condition on is the same, for every nonempty .
define . rotation preserves cardinality and is a bijection on the powerset of .
the divisibility condition rewrites as: iff , where the sum uses the -shift convention. so counts exactly how many of 's rotations land in the set .
swapping summation order --- sum over first, then over --- each rotation is a bijection on nonempty subsets, so each contributes . but the set counted by excludes (which always satisfies the divisibility), so:
admits a host-element decomposition: rewrite it as the number of elements such that , where .
define . swapping the sum over with the sum over :
the key structural fact is that rotation by 1 bijects onto . the forward direction (rotation preserves membership and divisibility, shifting the host element) is proved by case analysis on whether (wrapping) or (no wrapping). the backward direction uses rotation by . since rotation gives in both directions, equality holds:
finally, a reflection bijection establishes . reflection maps element 0 to element (the representation of in the shifted indexing) and preserves the divisibility condition since .
therefore:
term-by-term implies . dividing by :
the d = 1 term in contributes . every odd divisor contributes at most , since and . summing this tail over gives , which is for any . so
substituting :
the factor ensures the sum is dominated by small .
the standard identity for a geometric partial sum gives
substituting and exchanging the finite sums ( first, then ):
the remainder term is after splitting the -sum at : the low half is bounded by the polylogarithmic tsum, the high half decays as .
the inner partial sum converges to the polylogarithmic series:
proved by strong induction on . the base case is the geometric series
for the inductive step, the shift identity
is valid for , and when combined with the binomial expansion
and the fubini recurrence
the value pins.
| source | bound | absorbed by |
|---|---|---|
| non-dominant divisors | ||
| geometric remainder | after | |
| partial-sum truncation | per coefficient | after |
all three are .
a051293_counting.lean links the three modules:
// module A: the DFT formula matches the combinatorial count
b_comb_eq_b(k): b_comb(k) == b(k) for all k >= 1
// module B: per-k identity via Zumkeller
zumkeller_identity(k): b_comb(k) - 1 == |maxKIntMeanSubsets(k)|
// summation: partition integer-mean subsets by max element
a_comb_eq_sum(n): a_comb(n) == sum_{k=1}^{n} (b_comb(k) - 1)
// bridge: combinatorial count equals analytic formula
a_comb_eq_a(n): a_comb(n) == a(n) for all n >= 1
// module C: asymptotic expansion of a(n)
asymptotic_expansion(M): a(n) = fubini_expansion + o(2^n/n^{M+1})
composing a_comb_eq_a with asymptotic_expansion
yields the final theorem:
cloitre_conjecture(M):
a_comb(n) = (2^{n+1}/n) * sum_{i=0}^{M} F(i)/n^i
+ o(2^n / n^{M+1})
from start to finish, it took approximately 6 hours and a few fresh sessions to grok the proof. i suspect that much of this time can be greatly reduced with better tooling which i am actively working on.
as far as the results, module a and module c are what
i believe would be called "standard analysis."
module b is new as far as i am aware. the identity
was observed empirically by Zumkeller in 2006 and
restated by Wiseman in 2019. no proof, formal or
informal, appears anywhere in the published literature
or obvious websites. the lean formalization i "wrote"
is the first to my knowledge.
additionally, these results appear to be fundamentally different in at least two ways to those published by AlphaProof Nexus.
the lean source code for the proof can be found on my github.