Using Mac M2 Ultra 192GB to Self-Host LLMs?

shaserlark@sh.itjust.works · 2 days ago

Thanks for the reply, still reading here. Yeah thanks to the comments and reading some benchmarks I abandoned the idea of getting an Apple, it’s just too slow.

I was hoping to test Qwen 32B or llama 70b for running longer contexts, hence the apple seemed appealing.

shaserlark@sh.itjust.works · 2 days ago

Congrats on being that guy

shaserlark@sh.itjust.works · 4 days ago

You’re aware that there’s the OpenAI API library right? https://github.com/openai/openai-python

It’s really nothing fancy especially on Lemmy where like 99% of people are software engineers…

shaserlark@sh.itjust.works · 4 days ago

Are you drunk?

shaserlark@sh.itjust.works · edit-2 6 days ago

Yeah I found some stats now and indeed you’re gonna wait like an hour to process if you throw like 80-100k token into a powerful model. With APIs that kinda works instantly, not surprising but just to give a comparison. Bummer.

shaserlark@sh.itjust.works · edit-2 6 days ago

Thanks! Hadn’t thought of YouTube at all but it’s super helpful. I guess that’ll help me decide if the extra Ram is worth it considering that inference will be much slower if I don’t go NVIDIA.

shaserlark@sh.itjust.works · 6 days ago

Yeah I was thinking about running something like Code Qwen 72B which apparently requires 145GB Ram to run the full model. But if it’s super slow especially with large context and I can only run small models at acceptable speed anyway it may be worth going NVIDIA alone for CUDA.

shaserlark@sh.itjust.works · 6 days ago

Proud of you. Done it a long time ago. Would do it again either

shaserlark@sh.itjust.works · 6 days ago

Meh, ofc I don’t.

shaserlark@sh.itjust.works · 6 days ago

Thanks, that’s very helpful! Will look into that type of build

shaserlark@sh.itjust.works · 6 days ago

I understand what you’re saying but I’m coming to this community because I like having more input, hear about the experience of others and potentially learn about things I didn’t know about. I wouldn’t ask specifically in this community if I wouldn’t want to optimize my setup as much as I can.

shaserlark@sh.itjust.works · 6 days ago

Interesting, is there any kind of model you could run at reasonable speed?

I guess over time it could amortize but if the usability sucks that may make it not worth it. OTOH really don’t want to send my data to any company.

shaserlark@sh.itjust.works · 6 days ago

I’d honestly be open for that but would an AMD setup not take up a lot of space and consume lots of power / be loud?

It seems like in terms of price & speed, the Macs suck compared to other options, but if you don’t have a lot of space and don’t want to hear an airplane engine constantly I’m wondering if there are options.

shaserlark@sh.itjust.works · 6 days ago

Yeah the VRAM of Mac M series is very attractive for running models at full context length and the memory bandwidth is quite good for token generation compared to the price, power consumption and heat generation of NVidia GPUs.

Since I’ll have to put this in my kitchen/living room that’d be a big plus but idk how well prompt processing would work if I send over like 80k tokens.

shaserlark@sh.itjust.works · edit-2 6 days ago

Using Mac M2 Ultra 192GB to Self-Host LLMs?

shaserlark@sh.itjust.works · 16 days ago

This all we need to know lmao, thanks

shaserlark@sh.itjust.works · 19 days ago

Meh it’s fine, the mods can insist that it’s racist and they can insist there’s no genocide happening in Xinjiang, and I won’t be there to argue with them. My ban may be expired but I won’t be back. I’ll keep calling them out on other instances tho, and even though the lemmy dev are tankies themselves I respect them for making such a concept possible.

shaserlark@sh.itjust.works · 19 days ago

They can call Obama a piece of shit as far as I’m concerned.

The mod was kinda right to remove the comment before that because that user is some Hasbara shill and was mentioning the Uyghurs as a deflection.

My issue isn’t with that, my issue is with the formulation “the Uyghur genocide narrative“. That’s literally the same shit that liberal media is saying about Gaza. So that’s what I mean, I’ll always be “yapping“ about the Uyghur genocide the same way I’m “yapping“ about the Ukraine & Gaza genocide. I don’t care who does it, it’s wrong.

shaserlark@sh.itjust.works · 19 days ago

shaserlark@sh.itjust.works · 19 days ago

Ah I love the modlog, convenient that I can see again what happened.

The whole Winnie thing became forbidden in China partly because the meme went super viral on Weibo and they didn’t want people in China to ridicule the supreme leader. Ban me for trolling the mod by equating them to Xi Jinping, I’m fine with that. I will think you should have a thicker skin but whatever. Using racism as a reason to ban me in this context is just censorship.

To provide the context for those too lazy to check the modlog, the comment I’m answering to is straight genocide denial. The Uyghurs are being held in concentration camps, sterilized and their language, religion, and heritage are being wiped out and .ml mods are denying that because it doesn’t fit in their world view.

shaserlark@sh.itjust.works · edit-2 20 days ago

Sorry but unfortunately it’s two sides of the same coin, at least from my experience.

I’m shitting on US and Western imperialism and their love for the Gaza genocide day in day out, had no problem to call the old senile fuck Genocide Joe and I argued against everyone who shilled for Kamala or the “lesser evil“ thing.

I got a lot of hatred for the latter part on .world and I often disagree with their liberal views. I criticized liberal media for fabricating certain narratives and some users thought I’m some MAGA brain damaged idiot. Some users really aren’t the smartest or at least very much stuck in their bi-partisan world view.

Anyway so now that you know all of this about me, I’m also criticizing Chinese and Russian imperialism + genocide the same way. I got banned after like 1 comment on .ml for that and they said it’s for “racism“.

I honestly don’t see any difference, it’s just that .world and .ml users picked different teams. And I got banned faster in .ml lol