Is it just memory bandwidth? Or is it that AMD is not well supported by pytorch well enough for most products? Or some combination of those?
Is it just memory bandwidth? Or is it that AMD is not well supported by pytorch well enough for most products? Or some combination of those?
If you’re using llama.cpp, some ROCM stuff recently got merged in. It works pretty well, at least on my 6600. I believe there were instructions for getting it working on Windows in the pull.
Thank you so much! I’ll be sure to check that out / get it updated