What we’re solving

A binary SVM trained on {(xi,yi)}i=1N\{(x_i, y_i)\}_{i=1}^N with labels yi{1,+1}y_i \in \{-1, +1\} has a dual problem in the Lagrange multipliers αRN\alpha \in \mathbb{R}^N:

maxα iαi    12i,jαiαjyiyjk(xi,xj)s.t.0αiC,  iαiyi=0.\max_\alpha\ \sum_i \alpha_i \;-\; \tfrac{1}{2} \sum_{i,j} \alpha_i \alpha_j\, y_i y_j\, k(x_i, x_j) \quad\text{s.t.}\quad 0 \le \alpha_i \le C,\ \ \sum_i \alpha_i y_i = 0.

That’s the formula in every textbook. Now read it again, slowly, and notice three things:

  1. The objective is quadratic in α\alpha.
  2. The box constraints 0αiC0 \le \alpha_i \le C are linear inequalities.
  3. The single equality iαiyi=0\sum_i \alpha_i y_i = 0 is two linear inequalities sandwiched together.

So the SVM dual is literally a QP. If we flip the sign to make it a minimization and stack everything into matrices, we can hand it straight to solve_qp.

Translation

Let KRN×NK \in \mathbb{R}^{N \times N} be the kernel matrix with entries Kij=k(xi,xj)K_{ij} = k(x_i, x_j), and yRNy \in \mathbb{R}^N the label vector. Then with A=(yy)KA = (y y^\top) \odot K (Hadamard product) and b=1b = -\mathbf{1}, the dual becomes

minα 12αAα+bαs.t.0αC,  yα=0.\min_\alpha\ \tfrac{1}{2}\,\alpha^\top A \alpha + b^\top \alpha \quad\text{s.t.}\quad 0 \le \alpha \le C,\ \ y^\top \alpha = 0.

The box constraint is handled by lower_bound=0.0, upper_bound=C in SolverConfigsnn_opt clips them in-place rather than treating them as inequalities, which is more stable. The equality yα=0y^\top \alpha = 0 is encoded as the inequality pair ±yαε\pm y^\top \alpha \le \varepsilon for a small slack ε\varepsilon.

End-to-end code

import numpy as np
from sklearn.datasets import make_classification
from sklearn.svm import SVC

from snn_opt import OptimizationProblem, SNNSolver, SolverConfig

# 1. Generate a small binary problem.
X, y = make_classification(n_samples=80, n_features=10,
                           n_informative=4, random_state=0)
y = 2 * y - 1                                 # {0,1} → {-1,+1}

# 2. Build the kernel matrix (RBF, gamma = 1/n_features).
gamma = 1.0 / X.shape[1]
sq = np.sum(X**2, axis=1)
K = np.exp(-gamma * (sq[:, None] + sq[None, :] - 2 * X @ X.T))

# 3. Translate to (A, b, C, d) form.
A = (np.outer(y, y)) * K + 1e-6 * np.eye(len(y))   # tiny ridge for PSD
b = -np.ones_like(y, dtype=float)

eps = 1e-4
C_eq = np.vstack([y, -y]).astype(float)            # ±yᵀα ≤ eps
d_eq = -eps * np.ones(2)
C_box, d_box = C_eq, d_eq                          # only the equality

# 4. Solve.
problem = OptimizationProblem(A=A, b=b, C=C_box, d=d_box)
config  = SolverConfig(max_iterations=5000,
                       lower_bound=0.0, upper_bound=1.0)
alpha   = SNNSolver(problem, config).solve(np.zeros_like(y, dtype=float)).final_x

# 5. Compare with sklearn.
clf = SVC(kernel="precomputed", C=1.0).fit(K, y)
sk_alpha = np.zeros_like(y, dtype=float)
sk_alpha[clf.support_] = clf.dual_coef_[0] * y[clf.support_]   # signed → unsigned

print(f"snn_opt objective: {0.5 * alpha @ A @ alpha + b @ alpha:.4f}")
print(f"sklearn objective: {0.5 * sk_alpha @ A @ sk_alpha + b @ sk_alpha:.4f}")

On any sensible dataset the two objectives match to 3-4 decimal places. The snn_opt solution will have a slightly different active set — the spike raster will tell you which support vectors fired and how often.

Why bother?

You wouldn’t replace sklearn.svm.SVC with this in production. The point of the exercise is different:

  • Pedagogy. The translation makes the QP-ness of SVM concrete.
  • Diagnostics. The spike raster shows the active-set evolution explicitly. sklearn hides this.
  • Composability. Once SVM is a QP, you can mix it with other QP-shaped things — kernel regression, structured-output methods, your own custom losses — all in the same solver.
  • Hardware. The same dynamics run on neuromorphic hardware. The classical solver doesn’t.

Next