Discussion about this post

User's avatar
Victualis's avatar

The only reason LLM parameter counts are so high is because we are using dense layers throughout. This makes training tractable and allows big matrix multiplications to be used to easily achieve high degrees of parallelism in inference, but the "real" core model is probably much smaller and sparser and deeper. It's hard to find it, but it does seem to be in there. So I'm not convinced anything useful is shown by comparing parameter counts with genome size.

Noah Fatsi's avatar

I've definitely had similar thoughts before. It's not apples-to-apples to say e.g. that CNNs need sooo many examples to learn the difference btwn a cat and a dog while we only need a handful. Because our ancestors have seen sooo many examples - meanwhile evolution imbues us with their knowledge - while the CNN starts as a blob of random weights.

In any case, it's a damn shame that we catastrophically forgot to breath underwater after tiktaalik left the sea.

5 more comments...

No posts

Ready for more?