2023)

By: Annamalai N

22 December 2023 at 00:00

I was thinking about what I can do next with ESP32 and micro-ROS and so I thought to learn RTOS first as it is also used and then dwelve into microROS.

For learning RTOS(gonna a start with FreeRTOS itself), I came across this tutorial which looks good.

Then I learned about Policy Gradient methods to solve MDPs from Deep RL course I’m doing from HF. Really the math is little involved which I have to dwelve step by step. While going across Policy Gradient Theorem derivation, I came across few tricks and assumptions used, for e.g.

Reinforce Trick: $\frac{\nabla_{\theta}P(\tau)}{P(\tau)} = \nabla_{\theta}\\log(P(\tau))$
State Distribution is independent of parameters($\theta$) of policy (I think this implies that the choice of action from action distribution given by the policy isn’t covered by the policy i.e its not a part of policy I guess).
Sampling m trajectories from the trajectory($\tau$) distribution

Next I have too do the hands-on and refer more about it.

RTOS

We can use RTOS when we have to run many tasks concurrently or if it’s time demanding, which can’t be done in general Super loop configurations(I mean the usual setup and loop parts).
ESP32 uses a modified version of FreeRTOS which supports its SMP (Symmetric MultiProcessing) architecture to schedule tasks by using both cores! (but this tutorials is only for multi-tasking in single core)

Task Scheduling

Context Switching : How are tasks are switched from one to another.
Task pre-emption

Annamalai
TIL (21/12/23)
21 December 2023 at 00:00

TIL (21/12/23)

Annamalai

By: Annamalai N

21 December 2023 at 00:00

Reproducability

From this blog, I got aware of this reproducibility issue in RL i.e execution of same alogrithm in same enironment gives different results each time. It might be due different initial conditions, seeds etc for e.g. issues faced when reproducing a deep RL paper by Matthew Rahtz.
For which the author proposes some statistical tests and he has a written a paper about this.

May be I have to have a look on it later.

ACL has been already implemented by these guys.
Some Eng resources I got,
1. Nuts and Bolts of Deep RL Experimentation by John Schulman
2. Notes by Falcon
3. Nuts and Bolts of Deep RL Experimentation by John Schulman
4. Notes by Falcon

micro-ROS

Today I did a hello world in micro-ROS. micro-ROS is used for interfacing ROS with resource constrained embedded devices. I had bought an ESP32-WROOM board since micro-ROS supports ESP, I thought of trying it and followed this post. In which I did,

I had compiled the int32_publisher example using idf.py(provided by ESP) and flashed it to my ESP board.
Then ran a micro-ROS agent(docker container) on my laptop and which recieved messages from ESP.

Messages Published

Basically we have to write a C code using ESP,micro-ROS and RTOS(FreeRTOS) libraries which then can be compiled & flashed into ESP and then it works accordingly. I had this issue with specifying the port for the agent.

Have to go through the rclc API.

I am gonna work on some project like with FreeRTOS & micro-ROS ?

Annamalai
TIL (20/12/2023)
20 December 2023 at 00:00

TIL (20/12/2023)

Annamalai

By: Annamalai N

20 December 2023 at 00:00

Bayesian Optimization

Bayesian optimization is a powerful strategy for finding the extrema of objective functions that are expensive to evaluate. It is particularly useful when these evaluations are costly, when one does not have access to derivatives, or when the problem at hand is non-convex.

The Bayesian Optimization algorithm can be summarized as follows:

1. Select a Sample by Optimizing the Acquisition Function.
2. Evaluate the Sample With the Objective Function.
3. Update the Data and, in turn, the Surrogate Function.
4. Go To 1.

It uses a
- Surrogate function - that approximates the relationship between I/O data of the sample. There are many ways to model, one of the ways is to use Random Forest/Gaussian Process (GP, with many different kernels) i.e here we’re kind of approximating the objective function such that it can be easily sampled.
- Acquisition function - It gives a sample that is to evaluated by the objective function. It is found by optimizing this function by various methods and it balances exploitation and exploration (E&E)[1].
It is highly used in Tuning of Hyperparameters e.g. Optuna,HyperOpt.

Optuna

API Reference
This video gives a good intro.

I am trying to use it for HPO of lunar lander environment, initally results weren’t that good. I think it’s because of not giving a proper intreval i.e a large intreval that won’t result in a good choice of HP. May be I have to give try other ways to make it work.

Paper Reading

I’m reading this paper, “Improving Environment Robustness of Deep Reinforcement Learning Approaches for Autonomous Racing Using Bayesian Optimization-based Curriculum Learning”, which I saw from ArXiV (in CS.RO) today, where they propose a method for automated curriculum selection which lead me to know about Bayesian Optimization.
Curriculum Learning is about spoon feeding different environments for agent exploration, so that we can get a robust policy.

References

https://machinelearningmastery.com/what-is-bayesian-optimization/

Normal view

RTOS

Task Scheduling

Reproducability

micro-ROS

Bayesian Optimization

Optuna

Paper Reading

References