You are viewing a single thread.
View all comments View context
20 points

Well, addition is built into the instruction set of any CPU, so it only takes one operation. On the other hand, one evaluation of a neural net involves several repeated matrix-vector multiplies followed by the application of a nonlinear “activation function”. Matrix-vector multiply for a square matrix will take 2020=400 multiply operations and about 2019 addition operations for a 20-dimensional input. So we’ll say maybe on the order of 1,000-10,000 times more operations depending on how many layers?

permalink
report
parent
reply
9 points

This is up to 200 digit numbers, so you’d actually need to use a custom implementation for representing the integers and software addition but then a naive algorithm would still be like… 200 operations. Could probably drastically reduce that as well.

permalink
report
parent
reply
2 points

200bit numbers only require like 10 registers. X86-64 has 16 general purpose registers so doing operations with 200 digit numbers should hypothetically only require 20 loads and 10 multiplies. So a well written bit of code could do it in under 100 ops (probably under 50). So assuming this LLM implementation is running on a big server, it’s probably doing the same calculation, less accurately, with some exponentially larger amount of operations.

permalink
report
parent
reply