{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreid62oiawmw23cehwddjk4pyx6mmwlwacdj546vbv4sdyc4p7igibi",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mnzsy73oqxv2"
  },
  "path": "/t/rnn-in-c-is-this-bptt-finally-right/176455#post_9",
  "publishedAt": "2026-06-11T18:01:27.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "ok.. i’m back to dumping my code online, too many weeks. reading back through the thread, my best guess was perhaps i’m training on one character behind, except after a few days of training, it spat out “orgetful” yesterday, when the lnn would have been around 10.7.. so i’m obviously doing something egregiously wrong as lnn seems to go down for 50-100k tests then rises, not paralleling any dynamic in material.\n\n\n    #define hidd 256\t//\thidden layer size\n    #define wind 64\t\t//\ttruncated BPTT depth\n    #define winp (wind + 1)\n    #define winm (wind - 1)\n    while (ifile.get(c)) {\n    char twin = tests & winm;\t//\ttruncated rnn windowing for indefinite serial processing\n    char tplus = twin + 1;\n    if (!(bool)twin) {\n    if ((bool)tests) memcpy(h[0], h[wind], sizeof h[0]);\t//\tif (!(bool)tests) memset(h[0], 0, sizeof h[0]);\telse memcpy(h[0], h[wind], sizeof h[0]);\n    }\n\n    \t\tchar tc = chartonet(c);\t\t//  \"next char\" (current read) \"correct prediction\"\n    \t\tmemset(netin, 0, sizeof netin);\t//for (unsigned int i = 0; i < 96; i++) netin[i] = netout[i] = prevout[twin][i] = 0.f;\n    \t\tnetin[b] = 1.f;\t\t\t//  \"prev char\" (previous read) \"current step\"\n    \t\tcin[twin] = b;\n\n    \t\tfor (unsigned int i = 0; i < hidd; i++) {   //  forward pass\n    \t\t\tfloat sum = hbias[i];\n    \t\t\tsum += nni[b][i];  //  \"hot one\"\n    \t\t\tfor (unsigned int j = 0; j < hidd; j++) sum += nnh[i][j] * h[twin][j];\t//\t\"logits\" pre normalising\n    \t\t\th[tplus][i] = tanhf(sum);\n    \t\t}\n    \t\tfloat m = -1e9f;\n    \t\tfor (int i = 0; i < 96; i++) {\n    \t\t\tfloat sum = obias[i];\n    \t\t\tfor (unsigned int j = 0; j < hidd; j++) sum += nno[j][i] * h[tplus][j];\n    \t\t\tnetout[twin][i] = sum;    m = fmax(m, netout[twin][i]);\n    \t\t}\n    \t\tfloat ssum = 0.f;\n    \t\tfor (unsigned int i = 0; i < 96; i++) {\n    \t\t\tnetout[twin][i] = exp(netout[twin][i] - m);\tssum += netout[twin][i];\n    \t\t}\n    \t\tsoftsum[twin] = 0;\n    \t\tif (ssum > 0.f) {\n    \t\t\tsoftsum[twin] = ssum; ssum = 1.f / ssum;\n    \t\t\tfor (unsigned int i = 0; i < 96; i++) netout[twin][i] *= ssum;\n    \t\t}\n    \t\tnetout[twin][tc] -= 1.f;\n\n\n    \t\tif ((bool)(tplus & wind)) {\t//\teg. 64 of 64 do BPTT back propogation through time\n    \t\t\tmemset(dnni, 0, sizeof dnni);\t//\t'gradients'\n    \t\t\tmemset(dnnh, 0, sizeof dnnh);\n    \t\t\tmemset(dnno, 0, sizeof dnno);\n    \t\t\tmemset(dhbias, 0, sizeof dhbias);\n    \t\t\tmemset(dobias, 0, sizeof dobias);\n    \t\t\tmemset(dh_next, 0, sizeof dh_next);\n\n    \t\t\tfor (int iter = winm; iter > -1; iter--) {\t\t//\tBPTT 'back propogation through time'\n    \t\t\t\tint iplus = iter + 1;\n    \t\t\t\tfor (unsigned int i = 0; i < 96; i++) {\n    \t\t\t\t\tdobias[i] += netout[iter][i];\n    \t\t\t\t\tfor (unsigned int j = 0; j < hidd; j++) dnno[j][i] += netout[iter][i] * h[iplus][j];\n    \t\t\t\t}\n    \t\t\t\tmemset(dh, 0, sizeof dh);\n    \t\t\t\tfor (int i = 0; i < hidd; i++) {\n    \t\t\t\t\tfloat sum = dh_next[i];\n    \t\t\t\t\tfor (int j = 0; j < 96; j++) sum += nno[i][j] * netout[iter][j];\n    \t\t\t\t\tdh[i] = sum;\n    \t\t\t\t}\n    \t\t\t\tfor (int i = 0; i < hidd; i++) {\n    \t\t\t\t\tdh_raw[i] = (1.f - (h[iplus][i] * h[iplus][i])) * dh[i];\n    \t\t\t\t\tdhbias[i] += dh_raw[i];\n    \t\t\t\t\tdnni[cin[iter]][i] += dh_raw[i];\t//\t\"hot one\" input instead of 96\n    \t\t\t\t\tfor (unsigned int j = 0; j < hidd; j++) dnnh[i][j] += dh_raw[i] * h[iter][j];\n    \t\t\t\t}\n    \t\t\t\tfor (int i = 0; i < hidd; i++) {\n    \t\t\t\t\tfloat sum = 0.f;\n    \t\t\t\t\tfor (int j = 0; j < hidd; j++) sum += nnh[j][i] * dh_raw[j];\n    \t\t\t\t\tdh_next[i] = sum;\n    \t\t\t\t}\n    \t\t\t\tebuf -= log(fmax(1e-18f, netout[iter][cin[iter]]));\n    \t\t\t\tedif = ebuf / (float)max(1, tests);\n    \t\t\t}\n\n    \t\t\tfloat max_grad = 5.f;\t\t//\tgradient clipping\n    \t\t\tfor (int i = 0; i < 96; i++) {\n    \t\t\t\tfor (int j = 0; j < hidd; j++) dnni[i][j] = fmax(-max_grad, fmin(max_grad, dnni[i][j]));\n    \t\t\t\tdobias[i] = fmax(-max_grad, fmin(max_grad, dobias[i]));\n    \t\t\t}\n    \t\t\tfor (int i = 0; i < hidd; i++) {\n    \t\t\t\tfor (int j = 0; j < hidd; j++) dnnh[i][j] = fmax(-max_grad, fmin(max_grad, dnnh[i][j]));\n    \t\t\t\tfor (int j = 0; j < 96; j++) dnno[i][j] = fmax(-max_grad, fmin(max_grad, dnno[i][j]));\n    \t\t\t\tdhbias[i] = fmax(-max_grad, fmin(max_grad, dhbias[i]));\n    \t\t\t}\n    for (int i = 0; i < 96; i++) {\t//\tapply learning ;)\n    \t\t\t\t\t\t\tfor (int j = 0; j < hidd; j++) {\n    \t\t\t\t\t\t\t\tnni[i][j] -= dnni[i][j] * learn;\n    \t\t\t\t\t\t\t\tnno[j][i] -= dnno[j][i] * learn;\n    \t\t\t\t\t\t\t}\n    \t\t\t\t\t\t\tobias[i] -= dobias[i] * learn;\n    \t\t\t\t\t\t}\n    \t\t\t\t\t\tfor (int i = 0; i < hidd; i++) {\n    \t\t\t\t\t\t\thbias[i] -= dhbias[i] * learn;\n    \t\t\t\t\t\t\tfor (int j = 0; j < hidd; j++) nnh[i][j] -= dnnh[i][j] * learn;\n    \t\t\t\t\t\t}\n\n\n\n",
  "title": "RNN in C - is this BPTT finally right?"
}