Hello,
There are many roads to getting a model to run locally on your computer. I'll do a brief write up but there are plenty of videos on the topic already.
1. Pick your UI
There exist a few frontend applications to run the LLMs
a. StableLM
b. text-generation-webui by oobabooga
2. Pick your model
Here are a few things you should know about models:
a. They come in several parameter sizes. Common ones are 7B (B as in Billion), 13B... all the way up to 70B and maybe even more! Generally more parameters means more accurate output, but at the cost of greater computational requirements.
b. Because even 7B parameter models are difficult to run without beastly hardware, there are groups and individuals who quantise the models. This reduces their computational requirements, with minimal loss in output quality. (
https://huggingface.co/TheBloke)
c. When picking a model match the VRAM requirement to what's listed on the model page. Try going up or down a quantisation depending on the output quality/ performance.
d. The base models like GPT, LLaMa, etc. get modified, optimised and uploaded to huggingface by various users. You will likely see these modified versions to be more popular than the base model.
For example, since you're specifically interested in LLaMa 2, here's a list of quantised models based on LLaMa 2:
https://huggingface.co/TheBloke?search_models=llama2&sort_models=downloads#models
3. Running your model
Now that you've picked your UI and Model, it's time to run it. Note that there are many sliders you can modify to tweak your output. Consult documentation or a tutorial to understand what these sliders do.
I'm abstracting a lot of steps. Here's a video you can follow that's more in depth:
Sorry if I'm off topic. I found myself to be of the appropriate skill level to respond to this type of request.