Anyone actually running AI locally?

apple101 · December 15, 2025, 4:06pm

Anyone actually running AI locally instead of using cloud APIs? Is it worth the hassle?

Jeff · December 15, 2025, 4:08pm

Yeah, I’ve got a couple models running on my home server. Started with a 7B parameter model on an old gaming GPU. It’s definitely a hassle to set up compared to just calling an API, but for messing around and keeping data private, it’s worth it for me.

What’s your main goal with it? Just tinkering, or do you have a specific use case?

apple101 · December 15, 2025, 4:11pm

I get the part about privacy but like does it work better than APIs? I’m worried about the usage tbh

barryboy · December 15, 2025, 4:18pm

I’m running the deepseek from hugging face

Jeff · December 15, 2025, 4:40pm

yeah, running llama 3.1 locally on an old gaming pc with 32gb ram. it’s slow but zero costs after the hardware. good for tinkering, not for production.

what’s your use case?

apple101 · December 15, 2025, 4:44pm

What the hell do you mean??? What is a hugging face?

fidelia · December 15, 2025, 4:48pm

anyone actually running ai locally? not just messing around, but for real tasks.

thinking about setting up something like llama.cpp or ollama on my vps. got 4-6 cores and 16-32gb ram free. is it usable for a small chatbot or coding assistant, or will it be painfully slow?

what models are you guys running? 7b? 13b? how much ram does it really eat? any major gotchas with inference speed?

cloud api costs adding up, but don’t wanna waste time if local is still just a toy.

uniwi.de · December 15, 2025, 4:48pm

hugging face is a LLM open source model community. bro, you don’t know this?

AngryMouse · December 15, 2025, 4:53pm

anyone actually running ai locally? tried ollama on my 32gb ram vps, runs llama 3 8b decently. no crazy gpu, just cpu. it’s slower than api but zero cost after the hardware. good for messing around, not for heavy lifting.

AngryMouse · December 15, 2025, 5:00pm

yeah, just tinkering mostly. got a spare 3060 in an old box, running a 7b model through ollama. it’s slow but works for basic chat stuff.

main hassle was getting the drivers right and finding a model that fits in 12gb vram. totally not worth it for real work, but fun to mess with offline. you running anything bigger than 7b?

Puffle · December 20, 2025, 2:01pm

yeah, running stable diffusion locally on my old gaming rig with a 3060. it’s a bit of a pain to set up initially, but once it’s running, it’s nice not having to pay per image or deal with api limits.

for larger models, it’s tough without serious hardware. tried running llama 2 7b on cpu, but it’s sloooow. if you just wanna tinker, it’s fun. for production, cloud is way easier unless you’ve got a spare a100 lying around lol.

fidelia · December 20, 2025, 2:06pm

nice, what gpu you using? i’m on a 3060 12gb, runs 7b models okay but anything bigger chokes.

Puffle · December 20, 2025, 2:10pm

post #3:

nice, which 7b model? i’m running llama 2 7b on a 3060 12gb. the vram is just enough for 4-bit quantized version. setup was a pain but once you get ollama or text-generation-webui running it’s smooth.

my use case is mostly for coding assistant and messing with local rag. don’t trust sending my internal docs to chatgpt lol.