{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreieuhkesijduanzihkgrqseo66cdaudv62lxiz5xo2ll3g2mgiacoi",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mnb25wclvgq2"
},
"path": "/t/rnn-in-c-is-this-bptt-finally-right/176455#post_1",
"publishedAt": "2026-06-01T21:17:26.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "hi all,\ndid i finally get the BPTT right? i am trying to ascertain method for elementary BPTT and produce a reference friendly to my style of comprehension and impartment. in the last three years i’ve discovered that if one doesn’t have it quite right, one may still get results that make you think it’s sort of working. i’m spending a lot of time trying to figure out what i’ve understood and who is telling something they know anything about and generally wasting my life away again trying to absorb another ancient method.\n\nthis psuedo code doesn’t handle trivial things like populating the input and stuff, but if the BPTT looks in order to you, PLEASE say something to infer it, thanks!\n\n#define T 25\n#define X_S 32\n#define H_S 64\n#define Y_S 32\n\nfloat Wxh[H_S][X_S]; float Whh[H_S][H_S]; float Why[Y_S][H_S]; // weights + biases\nfloat bh[H_S]; float by[Y_S];\n\nfloat x[T][X_S]; // Input history - BPTT buffers\nfloat h[T+1][H_S]; // Hidden state history (h[-1] is h[0] initialized to 0)\nfloat y[T][Y_S];\n\nfloat softsum[T];\n\nfloat dWxh[H_S][X_S]; float dWhh[H_S][H_S]; float dWhy[Y_S][H_S]; // Gradients (Accumulators)\nfloat dbh[H_S]; float dby[Y_S];\n\nvoid rnn_forward_and_backward(int targets[T]) { // 1. FORWARD PASS\n\n\n for (int t = 0; t < T; t++) {\n for (int i = 0; i < H_S; i++) {\n float sum = bh[i];\n for (int j = 0; j < X_S; j++) sum += Wxh[i][j] * x[t][j];\n for (int j = 0; j < H_S; j++) sum += Whh[i][j] * h[t][j];\n h[t+1][i] = tanhf(sum);\n }\n\n\nfloat ssum = 0;\nfor (int i = 0; i < Y_S; i++) {\nfloat sum = by[i];\nfor (int j = 0; j < H_S; j++) sum += Why[i][j] * h[t+1][j];\ny[t][i] = exp(sum); ssum += sum; // y[t][i] = sum;\n}\nsoftsum[t] = 0;\nif (ssum > 0.f) {\nsoftsum[t] = ssum = 1.f / ssum;\nfor (int i = 0; i < Y_S; i++) y[t][i] *= ssum;\n}\n}\n\n\n float dh_next[H_S] = {0.0f};\n\n // zero out gradient accumulators before BPTT dWxh, dWhh, dWhy, dbh, dby to 0\n\n for (int t = T - 1; t >= 0; t--) {\t//\tBPTT back propogation through time\n\n float dy[Y_S];\n for (int i = 0; i < Y_S; i++) {\n float softmax_out = expf(y[t][i]) / softsum[t];\n float target_out = (i == targets[t]) ? 1.0f : 0.0f;\n dy[i] = softmax_out - target_out;\n }\n for (int i = 0; i < Y_S; i++) {\n dby[i] += dy[i];\n for (int j = 0; j < H_S; j++) dWhy[i][j] += dy[i] * h[t+1][j];\n }\n float dh[H_S];\n for (int i = 0; i < H_S; i++) {\n float sum = dh_next[i];\n for (int j = 0; j < Y_S; j++) sum += Why[j][i] * dy[j];\n dh[i] = sum;\n }\n float dh_raw[H_S];\n for (int i = 0; i < H_S; i++) dh_raw[i] = (1.0f - (h[t+1][i] * h[t+1][i])) * dh[i];\n\n for (int i = 0; i < H_S; i++) {\n dbh[i] += dh_raw[i];\n for (int j = 0; j < X_S; j++) dWxh[i][j] += dh_raw[i] * x[t][j];\t// optimise j with storing valid i for x[t]\n for (int j = 0; j < H_S; j++) dWhh[i][j] += dh_raw[i] * h[t][j];\n }\n for (int i = 0; i < H_S; i++) {\n float sum = 0.0f;\n for (int j = 0; j < H_S; j++) sum += Whh[j][i] * dh_raw[j];\n dh_next[i] = sum;\n }\n\n }\n\n\n}\n\nto me this style of notation tells someone exactly how an operation is performed (as long as they understand += style operators). but for the last thirty years, people basically say they want variable names like little novels and to object orient all the code in seventeen documents so it takes you three days to discern if anyone has a clue what they’re talking about.\n\nthank you!",
"title": "RNN in C - is this BPTT finally right?"
}