Pelican Parts Forums - Custom ChatGPT

Pelican Parts Forums (http://forums.pelicanparts.com/)

- Off Topic Discussions (http://forums.pelicanparts.com/off-topic-discussions/)

- - Custom ChatGPT (http://forums.pelicanparts.com/off-topic-discussions/1140431-custom-chatgpt.html)

Because I have nothing better to do...

I am going to train a local instance of ChatGPT with Pelican Parts 911 Forum posts.

Because I am new to this I am simply going to follow the instructions here: https://medium.com/@sohaibshaheen/train-chatgpt-with-custom-data-and-create-your-own-chat-bot-using-macos-fb78c2f9646d

Using my best Jeremy Clarkson impersonation, "How hard can it be?"

I know the forums are searchable, but hey, the kids these days don't search and just want to ask questions. And let's face it, many of the experts have left the earth, and hopefully they have enough posts that this machine can learn from.

I do realize that the "free" version has a limit and I'm willing to spend a few bucks here and there, but I have no plans to "productize" this. This is a learning exercise. I may even reach the limits of my local machine. So if you have ideas about how to run this on a server on the cheap, I'm all ears.

Here's my question. Should I pick posts by a specific set of people? Or should I just pick specific topics like no-start issues, sputtering and stalling?

exclude PPOT and esp PARF!

Quote:

Originally Posted by wildthing (Post 12005919)

Here's my question. Should I pick posts by a specific set of people? Or should I just pick specific topics like no-start issues, sputtering and stalling?

Grady Clay, WarrenS (is that the right username?) come to mind.

Okay. That can come in handy for the stuff i do...

How is this coming along?

Turns out you don't need to go through all that trouble, you can just feed a general LLM the data and it will learn from it. The biggest hurdle is to extract the data. I am wondering if the PelicanParts board tech team can extract the responses of these users for us straight from the database. It's publicly available, just need an extract.

That looks like RAG rather than training.

If you want to install [build] vllm or ollama and your Mac is new enough - apple silicon - you can run locally for free. I have a few mistral models I can switch between that I’m testing on.

It’s real easy - and free - to use elasticsearch for the vector database.

I’m doing this as an R&D project with langchain4j for the integration layer (I’m rather better with Java than Python because I use it a lot more) to augment mistral to work as a user agent to act as L1 support.