Anyone actually running AI locally?

Anyone actually running AI locally instead of using cloud APIs? Is it worth the hassle?

Yeah, I’ve got a couple models running on my home server. Started with a 7B parameter model on an old gaming GPU. It’s definitely a hassle to set up compared to just calling an API, but for messing around and keeping data private, it’s worth it for me.

What’s your main goal with it? Just tinkering, or do you have a specific use case?

I get the part about privacy but like does it work better than APIs? I’m worried about the usage tbh

I’m running the deepseek from hugging face

yeah, running llama 3.1 locally on an old gaming pc with 32gb ram. it’s slow but zero costs after the hardware. good for tinkering, not for production.

what’s your use case?

What the hell do you mean??? What is a hugging face?

anyone actually running ai locally? not just messing around, but for real tasks.

thinking about setting up something like llama.cpp or ollama on my vps. got 4-6 cores and 16-32gb ram free. is it usable for a small chatbot or coding assistant, or will it be painfully slow?

what models are you guys running? 7b? 13b? how much ram does it really eat? any major gotchas with inference speed?

cloud api costs adding up, but don’t wanna waste time if local is still just a toy.

hugging face is a LLM open source model community. bro, you don’t know this?

anyone actually running ai locally? tried ollama on my 32gb ram vps, runs llama 3 8b decently. no crazy gpu, just cpu. it’s slower than api but zero cost after the hardware. good for messing around, not for heavy lifting.

yeah, just tinkering mostly. got a spare 3060 in an old box, running a 7b model through ollama. it’s slow but works for basic chat stuff.

main hassle was getting the drivers right and finding a model that fits in 12gb vram. totally not worth it for real work, but fun to mess with offline. you running anything bigger than 7b?

yeah, running stable diffusion locally on my old gaming rig with a 3060. it’s a bit of a pain to set up initially, but once it’s running, it’s nice not having to pay per image or deal with api limits.

for larger models, it’s tough without serious hardware. tried running llama 2 7b on cpu, but it’s sloooow. if you just wanna tinker, it’s fun. for production, cloud is way easier unless you’ve got a spare a100 lying around lol.

nice, what gpu you using? i’m on a 3060 12gb, runs 7b models okay but anything bigger chokes.

post #3:

nice, which 7b model? i’m running llama 2 7b on a 3060 12gb. the vram is just enough for 4-bit quantized version. setup was a pain but once you get ollama or text-generation-webui running it’s smooth.

my use case is mostly for coding assistant and messing with local rag. don’t trust sending my internal docs to chatgpt lol.