ChatGPT & RLHF

7 January 2025 at 08:15

Today I am back with an interesting topic, which I would like to share with you guys. Nowadays we all use AI as normal in our life. But actually, the use of AI begins with one AI. Which is ChatGPT. Did you think about it? How does the ChatGPT give more and more data as it is mostly accurate?

The Blog is about that only. Come on guys, have a joyful dive.

The ChatGPT uses the technique or methodology of Reinforcement Learning From Human Feedback(RLFH). It looks complex, right? It’s a simple concept.

In our childhood, when we play in the ground we eat the sand right like God Krishna. But we do not show the whole universe in our mouths like them. I just take that example here. When Mother saw that they had beaten us and said not to do? Likewise in our school, when we got the first mark our mother appreciated us.

Here we learn what to do? by the feedback.

Here the AI should Rewarded(Positive Feedback) when they have perfectly done that or Otherwise they get Penalties(Negative Feedback). As a result, they change as per the feedback. The same thing here is done by Reinforcement Learning. They try a lot of things. This means here they give various results and get a lot of Feedback. By that, they learned, what need to do and don’t.

Now the question comes to your mind such as “How is this used by ChatGPT?”

We all know ChatGPT is used by a lot of people in various ways. We also know that it’s just an AI that replies to us as per the pre-trained data or already existing data. But people should ask the real-time data. For example, the model should be trained and launched during Joe Biden’s presidency. At the time, the model was fine-tuned to provide accurate and contextually relevant information about policies, initiatives, and events related to Joe Biden. However, after the next election, Donald Trump became the president. But still, the ChatGPT should give the same result as Joe Biden it’s an incorrect and also an outdated response right? To prevent that the methodology of Reinforcement Learning is used.

To give real-time data, they should not only use this RLHF method. They also use Web Scrapping to get data & more other things. But the RLHF is also an important thing to give Real-time data by the ChatGPT. Because now the ChatGPT is not only the Chatbot AI or just a text-based AI. Now the ChatGPT 4 is an Multimodal AI. To learn more about Multimodal AI check the link: https://cloud.google.com/use-cases/multimodal-ai

Source: https://medium.com/lansaar/understanding-multimodal-ai-6d71653994a2

For that, they should use various methodologies to tune the modal to give better results for the users. But this RLHF methodology is more interesting than others for me. So, I like to share it with you guys.

Note: Even the ChatGPT uses the methodology of Reinforcement Learning it’s trying to give more accuracy. But the result is not 100% perfect till now. Which means till 07/01/2025.

When feel this content is valuable follow me for more upcoming Blogs.

Connect with Me:

LinkedIn: Anand Sundaramoorthy
Instagram: @anandsundaramoorthysa
Email: sanand03072005@gmail.com

Reading view