The discourse almost exclusively revolves around larger neural models. Is scaling really the only way, or is a look back worthwhile?