How can a small language model learn open conversation?
So I started with ChatGPT 2 base model from GitHub, which, despite the claims, was an empty shell. I want to grow the model into a large model overall. The base model was the source for the body upgrade: “source_layers”: 12, (old body) “target_layers”: 24, (new body) “source_embd”: 768, “target_embd”: 1024, “source_heads”: 12, “target_heads”: 16, “source_ctx”: 1024, “target_ctx”: 1024, with a total vocab of ~120k tokens over all. I have very strongly grounded all the new and old tokens in definitions and examples. Yet, as my original post stated, I cannot seem to cross that line.
The goal is to make the model have the same cumulative communicational skill as ChatGPT 3+, yet I cannot do that if the model cannot combine all the education into a communication matrix.
Discussion in the ATmosphere