ChatGPT & RLHF

Today I am back with an interesting topic, which I would like to share with you guys. Nowadays we all use AI as normal in our life. But actually, the use of AI begins with one AI. Which is ChatGPT. Did you think about it? How does the ChatGPT give more and more data as it is mostly accurate?
The Blog is about that only. Come on guys, have a joyfulΒ dive.

The ChatGPT uses the technique or methodology of Reinforcement Learning From Human Feedback(RLFH). It looks complex, right? Itβs a simpleΒ concept.
In our childhood, when we play in the ground we eat the sand right like God Krishna. But we do not show the whole universe in our mouths like them. I just take that example here. When Mother saw that they had beaten us and said not to do? Likewise in our school, when we got the first mark our mother appreciated us.
Here we learn what to do? by the feedback.
Here the AI should Rewarded(Positive Feedback) when they have perfectly done that or Otherwise they get Penalties(Negative Feedback). As a result, they change as per the feedback. The same thing here is done by Reinforcement Learning. They try a lot of things. This means here they give various results and get a lot of Feedback. By that, they learned, what need to do andΒ donβt.
Now the question comes to your mind such as βHow is this used by ChatGPT?β
We all know ChatGPT is used by a lot of people in various ways. We also know that itβs just an AI that replies to us as per the pre-trained data or already existing data. But people should ask the real-time data. For example, the model should be trained and launched during Joe Bidenβs presidency. At the time, the model was fine-tuned to provide accurate and contextually relevant information about policies, initiatives, and events related to Joe Biden. However, after the next election, Donald Trump became the president. But still, the ChatGPT should give the same result as Joe Biden itβs an incorrect and also an outdated response right? To prevent that the methodology of Reinforcement Learning isΒ used.
To give real-time data, they should not only use this RLHF method. They also use Web Scrapping to get data & more other things. But the RLHF is also an important thing to give Real-time data by the ChatGPT. Because now the ChatGPT is not only the Chatbot AI or just a text-based AI. Now the ChatGPT 4 is an Multimodal AI. To learn more about Multimodal AI check the link: https://cloud.google.com/use-cases/multimodal-ai

For that, they should use various methodologies to tune the modal to give better results for the users. But this RLHF methodology is more interesting than others for me. So, I like to share it with youΒ guys.
Note: Even the ChatGPT uses the methodology of Reinforcement Learning itβs trying to give more accuracy. But the result is not 100% perfect till now. Which means till 07/01/2025.
When feel this content is valuable follow me for more upcomingΒ Blogs.
Connect withΒ Me:
- LinkedIn: Anand Sundaramoorthy
- Instagram: @anandsundaramoorthysa
- Email: sanand03072005@gmail.com
