Reading view

There are new articles available, click to refresh the page.

git checkout HEAD filename

In GIT, usually when a commit occurs, HEAD points to the latest commit.

For an example, there is a commit made in a file. You further made some changes. Later at some point, you realize that change is not required anymore. In this case, you need to point the head to the old HEAD. Below command should be used.

git checkout HEAD <<filename>>

In a nutshell – above command is used to discard changes in the working directory

git reset HEAD filename

Imagine you made changes to file1.txt and file2.txt and added to staging area. Further you accidentally made some changes to file3.txt and added to staging area. Later you realize, changes made to file3.txt is unnecessary.

So, in order to remove file3.txt from staging area, use below command

git reset HEAD <<filename>>

basically this command is used to unstage changes from staging area.

Remember, this command only removes file from staging area. The changes you made to the file remains available.

git reset commit_SHA

Imagine if you have made 10 commits on a single file. If in case, if you need to discard all the changes from 6th commit, you would use this command so that all commits from 7th to 10th commit will be removed.

git reset 232bfg2

here 232bfg2 is first 7 characters of git commit SHA

Connect postman to salesforce

Today, I want to capture notes that I learnt from trailhead academy on connecting postman to a salesforce org.

To make postman allow changes at Salesforce org, we have to enable CORS policy in Salesforce. See below what does CORS mean.

CORS- Cross Origin Resource Sharing

It is a browser feature that controls how resources are requested from one site to another site. By configuring CORS, it enables special permissions for other external websites to access our salesforce data. In this case, we are enabling CORS for postman to access salesforce.

  • From setup ==> search for CORS ==> Add https://*.postman.co and https://*.postman.com URL
  • After that, in postman desktop -Do below steps one by one.
  • Create a separate workspace for Salesforce APIs to play around.
  • Search for Salesforce APIs. It does list out all the available collections.
  • Fork “Salesforce Platform API” and it will available to your local postman workspace.
  • After that, go to “Authorization” click on “Generate token” and copy “instance” URL.
  • Configure “_endpoint” value from variable tab as “instance” URL
  • All set and that’s it. You can play around whatever requests that are available.

git checkout HEAD filename

In GIT, usually when a commit occurs, HEAD points to the latest commit.

For an example, there is a commit made in a file. You further made some changes. Later at some point, you realize that change is not required anymore. In this case, you need to point the head to the old HEAD. Below command should be used.

git checkout HEAD <<filename>>

In a nutshell – above command is used to discard changes in the working directory

git reset HEAD filename

Imagine you made changes to file1.txt and file2.txt and added to staging area. Further you accidentally made some changes to file3.txt and added to staging area. Later you realize, changes made to file3.txt is unnecessary.

So, in order to remove file3.txt from staging area, use below command

git reset HEAD <<filename>>

basically this command is used to unstage changes from staging area.

Remember, this command only removes file from staging area. The changes you made to the file remains available.

git reset commit_SHA

Imagine if you have made 10 commits on a single file. If in case, if you need to discard all the changes from 6th commit, you would use this command so that all commits from 7th to 10th commit will be removed.

git reset 232bfg2

here 232bfg2 is first 7 characters of git commit SHA

Connect postman to salesforce

Today, I want to capture notes that I learnt from trailhead academy on connecting postman to a salesforce org.

To make postman allow changes at Salesforce org, we have to enable CORS policy in Salesforce. See below what does CORS mean.

CORS- Cross Origin Resource Sharing

It is a browser feature that controls how resources are requested from one site to another site. By configuring CORS, it enables special permissions for other external websites to access our salesforce data. In this case, we are enabling CORS for postman to access salesforce.

  • From setup ==> search for CORS ==> Add https://*.postman.co and https://*.postman.com URL
  • After that, in postman desktop -Do below steps one by one.
  • Create a separate workspace for Salesforce APIs to play around.
  • Search for Salesforce APIs. It does list out all the available collections.
  • Fork “Salesforce Platform API” and it will available to your local postman workspace.
  • After that, go to “Authorization” click on “Generate token” and copy “instance” URL.
  • Configure “_endpoint” value from variable tab as “instance” URL
  • All set and that’s it. You can play around whatever requests that are available.

Kanchi Linux Users Group Monthly Meeting – Dec 08, 2024

Hi everyone,
KanchiLUG’s Monthly meet is scheduled as online meeting this week on Sunday, Dec 08, 2024 17:00 – 18:00 IST

Meeting link : https://meet.jit.si/KanchiLugMonthlyMeet

Can join with any browser or JitSi android app.
All the Discussions are in Tamil.

Talk Details

Talk 0:
Topic : my Elisp ‘load random theme’ function
Description : I wanted to randomly load a theme in Emacs during startup. After i search in online, I achieved this
functionality using Emacs Lisp. this my talk Duration : 10 minutes
Name : Krishna Subramaniyan
About :GNU/Linux and Emacs user 😉

Talk 1:
Topic : PDF generation using python
Description : To demo a python program which will generate a PDF output. Duration : 20 minutes
Name : Sethu
About : Member of KanchiLUG & Kaniyam IRC Channel

Talk 2:
Topic : distrobox – a wrapper on podman/docker
Description : Intro about the tool, why I had to use that and a demo Duration : 15 minutes
Name : Annamalai N
About : a GNU/Linux user

Talk 3:
Topic : Real Time Update Mechanisms (Polling, Long Polling, Server Sent Events)
Description : To demo Real Time Update Mechanisms with JS and Python Duration : 30 minutes
Name :Syed Jafer (parottasalna)
About : Developer. Currently teaching postgres at
https://t.me/parottasalna

After Talks : Q&A, General discussion

About KanchiLUG : Kanchi Linux Users Group [ KanchiLUG ] has been spreading awareness on Free/Open Source Software (F/OSS) in
Kanchipuram since November 2006.

Anyone can join! (Entry is free)
Everyone is welcome
Feel free to share this to your friends

Mailing list: kanchilug@freelists.org
Repository : https://gitlab.com/kanchilug
Twitter handle: @kanchilug
Kanchilug Blog : http://kanchilug.wordpress.com

To subscribe/unsubscribe kanchilug mailing list :
http://kanchilug.wordpress.com/join-mailing-list/

Kanchi Linux Users Group Monthly Meeting – Dec 08, 2024

Hi everyone,
KanchiLUG’s Monthly meet is scheduled as online meeting this week on Sunday, Dec 08, 2024 17:00 – 18:00 IST

Meeting link : https://meet.jit.si/KanchiLugMonthlyMeet

Can join with any browser or JitSi android app.
All the Discussions are in Tamil.

Talk Details

Talk 0:
Topic : my Elisp ‘load random theme’ function
Description : I wanted to randomly load a theme in Emacs during startup. After i search in online, I achieved this
functionality using Emacs Lisp. this my talk Duration : 10 minutes
Name : Krishna Subramaniyan
About :GNU/Linux and Emacs user 😉

Talk 1:
Topic : PDF generation using python
Description : To demo a python program which will generate a PDF output. Duration : 20 minutes
Name : Sethu
About : Member of KanchiLUG & Kaniyam IRC Channel

Talk 2:
Topic : distrobox – a wrapper on podman/docker
Description : Intro about the tool, why I had to use that and a demo Duration : 15 minutes
Name : Annamalai N
About : a GNU/Linux user

Talk 3:
Topic : Real Time Update Mechanisms (Polling, Long Polling, Server Sent Events)
Description : To demo Real Time Update Mechanisms with JS and Python Duration : 30 minutes
Name :Syed Jafer (parottasalna)
About : Developer. Currently teaching postgres at
https://t.me/parottasalna

After Talks : Q&A, General discussion

About KanchiLUG : Kanchi Linux Users Group [ KanchiLUG ] has been spreading awareness on Free/Open Source Software (F/OSS) in
Kanchipuram since November 2006.

Anyone can join! (Entry is free)
Everyone is welcome
Feel free to share this to your friends

Mailing list: kanchilug@freelists.org
Repository : https://gitlab.com/kanchilug
Twitter handle: @kanchilug
Kanchilug Blog : http://kanchilug.wordpress.com

To subscribe/unsubscribe kanchilug mailing list :
http://kanchilug.wordpress.com/join-mailing-list/

Weekly Notes 48 – 2024

Christmas Lighting at Niagara

Few weeks ago, Niagara had its Christmas lighting started. Went there with friends. We went there in the evening. Niagara is one of the greatest natural beauty, which we can see a million times. Visited the Casino there. Got 10$ free card and played with few slot machines. Won 3$ and lost all 13$. Though it was a free money, it was tough to stop the game.

Then, visited the wonderful lighting. It was a very long walk, in dark roads, along with glittering lights on the road sides. Kids enjoyed much to see them all.

Few photos are here – https://shrini-clicks.kaniyam.cloudns.nz/#/collections/albums/2024-niagara-chrismas-lighting

Winter Celebration at Heart Comonos

Last Saturday, we had a grand event to celebrate the winter, organized by HeartComonos team. I volunteered a little amount for the event. We had nearly 2 months of preparation. All the volunteers made the event a memorable one. We had around 300 participants. The Bollywood dance team won all the attention. I had a makeover as ELF and was giving candy to all the kids there.

Few pics of the events are here – https://shrini-clicks.kaniyam.cloudns.nz/#/collections/albums/2024-welcoming-winter-2024

Daily IRC meetings for open source project mentoring

We are having daily meeting for open source project mentoring. Around 10 people are doing different projects. We are discussing many things like linux, Emacs, productivity, book reviews etc. Read the logs here – https://ircbot.comm-central.org:8080/kaniyam

2025 planning for kaniyam

Started a thread to plan the 2025 activities for Kaniyam Foundation. Write there what do you think on what we can do next year.

https://forums.tamillinuxcommunity.org/t/topic/2723/2

Revamping FreeTamilEbooks.com

FreeTamilEbooks.com is with a very old theme for past 10+ years. We need the below changes.

  • Check for new theme
  • improve its SEO for search results.
  • Fix the send2kindle links
  • Fix the categories
  • Merge duplicate author names, categories
  • Fix the download stats
  • Get the detailed download report for all the books
  • have author page
  • have contributor page
  • remove email address on the book’s pages
  • add intro content to all books

Created an project idea issue for this here. https://github.com/KaniyamFoundation/ProjectIdeas/issues/237

Ravishankar is the founding member of FreeTamilEbooks.com and a mentor for kaniyam. He started to work on these tasks. He gave a new theme and improved the SEO. Need more volunteers to work on other items. Let me know if you can spend few hours for FreeTamilEbooks.com

Winter / Snow started

Today, we got the very first snow fall of the year. It is mesmerizing to see all the green lands are turning into white. For 4 months, we will be in hibernate state. Have to plan many indoor events. I have tons of books to read, tasks to complete.

LLM dataset part 3 released

We are collecting large amount of Tamil text with shareable, open licensed content, for LLM and other research works. So far, collected 1.6 GB of text, from Tamil Wikipedia, FreeTamilEbooks, project madurai, tamilmunn publishers books, etc. Get the data from here – https://kaniyam.cloudns.nz/tamil_datasets/

Read the blog posts on these here

part 1 – https://goinggnu.wordpress.com/2024/06/11/collecting-content-for-llm-dataset-part-1-tamil-wikipedia-content/

part 2 – https://goinggnu.wordpress.com/2024/06/16/collecting-content-for-llm-dataset-part-2-freetamilebooks/

part 3 – https://goinggnu.wordpress.com/2024/11/23/collecting-content-for-llm-dataset-part-3-thamizh_mann-books-project-madurai-wikisource/

Planning for scratch / python training for kids

Around Dec 25 to Jan 5, we will get winter break for schools. I am thinking of teaching python or scratch for kids in this break. Learning scratch for that. The graphics, the colors, drag/drop may be easy for kids. But it is tough for me as a terminal dweller. Yet to think and plan more on the training for kids. At least, I should teach the basics of programming and show a taste of making computers to obey our orders.

Weekly chess hours for kids

Started to teach chess for our condo kids. we conduct a weekly chess hour, to teach and play with other kids. Good to see that many kids know chess already and they all enjoy the game hours.

Moana 2

Watched Moana 2 yesterday. Viyan loved it. It is very difficult to being a part 2 movie as good as part 1. Moana team did a great job on this. Stunning graphics, good story line, nice music, heart melting songs make the move as a wonder. Don’t miss to watch it in theaters.

Books

completed – பேசத் தெரிந்த நிழல்கள் – sramakrishnan. Took it from a local library. It is a book full of review of world movies. Happy to read a physical tamil book apart from 1000s of miles.

In progress – Drupal, LLM, Digital Museums

100% savings on Thanksgiving Day

As usual, got 100% savings on Thanksgiving Day sales. Bought nothing. As we are following minimalism as much as possible, we feel that we have all the things required. I was thinking, if the name of the day is like “Genocide Memorial Day”, we won’t be rushing for shops to get offers and sales. They should rename the day to show the history.

Self-hosted social media – gotosocial

I am running a social media platform on my desktop, and interacting with the world using that. It is a software called ‘Gotosocial’. It is like twitter, but we can install on our own servers. The Fediverse activitypub protocol connects with other millions of such servers around the world.

Hosted it here https://social.kaniyam.cloudns.nz/

Happy to see many Emacs, FOSS, Linux, Self hosting lovers are there to interact. Getting replies for all the questions I am asking there. It gives much happiness, to be with like-minded people, around the globe.

Weekly Notes 48 – 2024

Christmas Lighting at Niagara

Few weeks ago, Niagara had its Christmas lighting started. Went there with friends. We went there in the evening. Niagara is one of the greatest natural beauty, which we can see a million times. Visited the Casino there. Got 10$ free card and played with few slot machines. Won 3$ and lost all 13$. Though it was a free money, it was tough to stop the game.

Then, visited the wonderful lighting. It was a very long walk, in dark roads, along with glittering lights on the road sides. Kids enjoyed much to see them all.

Few photos are here – https://shrini-clicks.kaniyam.cloudns.nz/#/collections/albums/2024-niagara-chrismas-lighting

Winter Celebration at Heart Comonos

Last Saturday, we had a grand event to celebrate the winter, organized by HeartComonos team. I volunteered a little amount for the event. We had nearly 2 months of preparation. All the volunteers made the event a memorable one. We had around 300 participants. The Bollywood dance team won all the attention. I had a makeover as ELF and was giving candy to all the kids there.

Few pics of the events are here – https://shrini-clicks.kaniyam.cloudns.nz/#/collections/albums/2024-welcoming-winter-2024

Daily IRC meetings for open source project mentoring

We are having daily meeting for open source project mentoring. Around 10 people are doing different projects. We are discussing many things like linux, Emacs, productivity, book reviews etc. Read the logs here – https://ircbot.comm-central.org:8080/kaniyam

2025 planning for kaniyam

Started a thread to plan the 2025 activities for Kaniyam Foundation. Write there what do you think on what we can do next year.

https://forums.tamillinuxcommunity.org/t/topic/2723/2

Revamping FreeTamilEbooks.com

FreeTamilEbooks.com is with a very old theme for past 10+ years. We need the below changes.

  • Check for new theme
  • improve its SEO for search results.
  • Fix the send2kindle links
  • Fix the categories
  • Merge duplicate author names, categories
  • Fix the download stats
  • Get the detailed download report for all the books
  • have author page
  • have contributor page
  • remove email address on the book’s pages
  • add intro content to all books

Created an project idea issue for this here. https://github.com/KaniyamFoundation/ProjectIdeas/issues/237

Ravishankar is the founding member of FreeTamilEbooks.com and a mentor for kaniyam. He started to work on these tasks. He gave a new theme and improved the SEO. Need more volunteers to work on other items. Let me know if you can spend few hours for FreeTamilEbooks.com

Winter / Snow started

Today, we got the very first snow fall of the year. It is mesmerizing to see all the green lands are turning into white. For 4 months, we will be in hibernate state. Have to plan many indoor events. I have tons of books to read, tasks to complete.

LLM dataset part 3 released

We are collecting large amount of Tamil text with shareable, open licensed content, for LLM and other research works. So far, collected 1.6 GB of text, from Tamil Wikipedia, FreeTamilEbooks, project madurai, tamilmunn publishers books, etc. Get the data from here – https://kaniyam.cloudns.nz/tamil_datasets/

Read the blog posts on these here

part 1 – https://goinggnu.wordpress.com/2024/06/11/collecting-content-for-llm-dataset-part-1-tamil-wikipedia-content/

part 2 – https://goinggnu.wordpress.com/2024/06/16/collecting-content-for-llm-dataset-part-2-freetamilebooks/

part 3 – https://goinggnu.wordpress.com/2024/11/23/collecting-content-for-llm-dataset-part-3-thamizh_mann-books-project-madurai-wikisource/

Planning for scratch / python training for kids

Around Dec 25 to Jan 5, we will get winter break for schools. I am thinking of teaching python or scratch for kids in this break. Learning scratch for that. The graphics, the colors, drag/drop may be easy for kids. But it is tough for me as a terminal dweller. Yet to think and plan more on the training for kids. At least, I should teach the basics of programming and show a taste of making computers to obey our orders.

Weekly chess hours for kids

Started to teach chess for our condo kids. we conduct a weekly chess hour, to teach and play with other kids. Good to see that many kids know chess already and they all enjoy the game hours.

Moana 2

Watched Moana 2 yesterday. Viyan loved it. It is very difficult to being a part 2 movie as good as part 1. Moana team did a great job on this. Stunning graphics, good story line, nice music, heart melting songs make the move as a wonder. Don’t miss to watch it in theaters.

Books

completed – பேசத் தெரிந்த நிழல்கள் – sramakrishnan. Took it from a local library. It is a book full of review of world movies. Happy to read a physical tamil book apart from 1000s of miles.

In progress – Drupal, LLM, Digital Museums

100% savings on Thanksgiving Day

As usual, got 100% savings on Thanksgiving Day sales. Bought nothing. As we are following minimalism as much as possible, we feel that we have all the things required. I was thinking, if the name of the day is like “Genocide Memorial Day”, we won’t be rushing for shops to get offers and sales. They should rename the day to show the history.

Self-hosted social media – gotosocial

I am running a social media platform on my desktop, and interacting with the world using that. It is a software called ‘Gotosocial’. It is like twitter, but we can install on our own servers. The Fediverse activitypub protocol connects with other millions of such servers around the world.

Hosted it here https://social.kaniyam.cloudns.nz/

Happy to see many Emacs, FOSS, Linux, Self hosting lovers are there to interact. Getting replies for all the questions I am asking there. It gives much happiness, to be with like-minded people, around the globe.

Jenkins - 1

  • Open source integration tool.
  • It can be made as centralised server by integrating multiple source like source code management , build tools , deployment environment .
  • Complete INTEGRATION TOOL
  • Its Java based.
  • To automate the repeated task.
  • CI/CD ( continuous integration and continuous delivery )
  • Even you can do patching.
  • In simple , its a centralised server.

Advantages

  1. CI/CD
  2. Open source
  3. Community driven
  4. Browser based
  5. Supports all Operating system.
  6. Distributed build --> Master and Slave node , how ? If you are setting up Jenkins , it uses resource of the system where its installed and accordingly the Jobs will take the RAM , CPU and resources. To avoid this "Distributed build" was introduced , so that "Master and Slave node" with that the load will be distributed.
  7. We can increase the scalability of master and slave.

CI/CD Workflow

  • Commit
  • Build
  • Test
  • Stage
  • Deploy
  1. Stages will be different in other CI/CD process. Purpose will be the same.
  2. Dedicated branch for each environment it will be a best practice to avoid any confusion here.

Installation & Configuration

sudo apt-get update
java -version

Image description

Add the official key of Jenkins and then the repo.

sudo wget -O /usr/share/keyrings/jenkins-keyring.asc \
  https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key
echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc]" \
  https://pkg.jenkins.io/debian-stable binary/ | sudo tee \
  /etc/apt/sources.list.d/jenkins.list > /dev/null
sudo apt-get update
sudo apt-get install jenkins

Image description

Image description

Image description

Now JENKINS had been installed , please check with command also whether its installed or not.

dpkg -l | grep -i jenkins

As we have encountered error , install higher version of Java from 11 to 17 or 21.

Image description

Image description

Image description

Image description

sudo systemctl start jenkins
sudo systemctl enable jenkins
sudo systemctl status jenkins
  • Now open the console with the IP of the system.

Image description

  • If you want to know the username , then go to below location.
cat /var/lib/jenkins/users/users.xml
/var/lib/jenkins/users/admin_17980521444909415742

want to change the password , edit and restart the jenkins.

<hudson.security.HudsonPrivateSecurityRealm_-Details> <passwordHash>#jbcrypt:$2a$10$OTYB2osGPi/rasutjHcYOOhByiCoaEZTEQk52CABOwYdrtxeIPnBu</passwordHash>
</hudson.security.HudsonPrivateSecurityRealm_-Details>

  • Give no password for jenkins user ( as its a internal user )

Image description

  • Check whether able to get output without sudo password.

Image description

User Management & security

  • Manage Jenkins > Security

Image description

  • Trying to use the OS user for security features , lets try.
  • Add the user to the group shadow like below and restart the Jenkins.

Image description

Image description

  • Now its integrated with the OS user (sathishpy1808) , you can even login with the OS users after integrating.

Image description

Image description

Jenkins own database

  • User created in Jenkins only can login , OS users can't.

User creation

  • Change to "Jenkins own database" then only you can view the user creation option like below ,

Image description

Image description

Image description

  • Go to the ravi user profile and explore.

Image description

  • Even you can terminate all sessions from the console.
  • You can change the password also and delete option is also available.

Authorisation

  • what type of authorisation is given to the users , you can see many options in the below screen.

Image description

  • BY default --> "logged-in users can do anything" its equal to admin access --> its not advisable.
  • Try to use with user based ,

Image description

  • Also ROLE based plugin it will be useful , lets install ( mostly used plugin for role based authorisation )

Image description

Image description

Image description

  • Manage & roles won't be available at first , it will come after the plugin installation only.

Image description

  • Manage roles and assign roles are very important option.
  • Lets create a role and then assign the users.
  • Pattern based users are created only in "Item Roles". Here if any user starts with word "Manage" , it should the managing job roles.

Image description

Image description

Image description

  • Create some Jobs to test this user specification.

Image description

Image description

Image description

  • Item roles in "Assign Roles" will be useful for pattern description for users.
  • You can see all roles which are added.
  • All users are added in this console.

Image description

Image description

Image description

Image description

Job Management

  • Jobs needs to be created to perform a task.

Image description

Image description

Image description

Image description

Build Triggers

  • "Trigger builds remotely (e.g., from scripts)" --> it will trigger once some other task gets triggered.
  • "GitHub hook trigger for GITScm polling" --> once the code gets committed then only the job needs to be triggered.

Image description

Build Environment

Image description

Build Steps

  • This is the main step , you can add number of steps here.
  • Raw commands , shell script , etc.
  • Can be added like stages.

Image description

Post-build Actions

Image description

Notes

  1. https://www.jenkins.io/doc/book/installing/linux/ --> In this site , you can get the details of the key and the repo details which we have added in the first of the installation part. [TBD]
  2. What is "openjdk-17-jdk-headless package" ?
  3. Default port of Jenkins is 8080.
  4. jenkins user will be always running.

Commands Used

java -version
locate jdk
whereis java

List of errors

Job for jenkins.service failed because the control process exited with error code.

Image description

Image description

  • Now you can get the details by typing ,
journalctl -xeu jenkins.service

Error clearly says , ( version not supported )

Dec 01 22:03:57 meet.sathishpy1808.org jenkins[40153]: Running with Java 11 from /usr/lib/jvm/java-11-openjdk-amd64, which is older than the minimum required version (J>
Dec 01 22:03:57 meet.sathishpy1808.org jenkins[40153]: Supported Java versions are: [17, 21]
Dec 01 22:03:57 meet.sathishpy1808.org jenkins[40153]: See https://jenkins.io/redirect/java-support/ for more information.
Dec 01 22:03:57 meet.sathishpy1808.org systemd[1]: jenkins.service: Main process exited, code=exited, status=1/FAILURE

Access Denied

Image description

  • change the ownership to the current user,
sudo chown -R sathishpy1808:sathishpy1808 /var/lib/jenkins/
sudo chown $(whoami) /var/lib/jenkins

Dangling meta character '*' near index 0

Image description

Collecting content for LLM dataset – Part 2 – FreeTamilEbooks

At FreeTamilEbooks.com we have published 850 ebooks. All in sharable creative commons license. There are many people asking for the text only content of all these books many times. As it is a big task, took long time for it. Thanks to Lenin, Anwar of Kaniyam Foundation, all the contributors, all the writers and readers for making this project alive and a great success.

We are publishing the books as epub format, along with PDF format. Epub is just a zip file of HTML files. So, we can copy all the content from it as unicode text. Pandoc is a wonderful open source software, which can convert an epub to plaintext file.

There are the list of actions we have to do.

  1. Get URLs of all the 850+ epub files
  2. Download them all.
  3. using pandoc, convert to text file.

So far, we dont have a metadata file for all the books published. Getting the links of all epub files need some programming. As Python is a swiss knife to automate anything, started to explore the wordpress REST api with python to get all the books pages content.

https://github.com/KaniyamFoundation/create_ebooks/blob/master/get_metadata/get_Data.py

Wrote the code here to get all the books info.

This gave a JSON file with book name, author, genre, epub, mobi, a4 pdf,6 inch pdf links.

Converted this to a CSV file with the below code. https://github.com/KaniyamFoundation/create_ebooks/blob/master/get_metadata/parse.py

I had to fix few things manually on the CSV file.

This is the final CSV file. https://github.com/KaniyamFoundation/create_ebooks/blob/master/get_metadata/fte_metadata.csv

The below code is to download all the epub files from their links in the fte_metadata.csv file. Used pandoc to convert to text.

https://github.com/KaniyamFoundation/create_ebooks/blob/master/get_metadata/get_fte_books.py

Got 845 txt files. Total size is 374 MB

Compressed with 7z to get 47MB compressed file.

Published the data here. https://kaniyam.cloudns.nz/tamil_datasets/

Download, share the text data for free. Dont sell them as most of the books are released as CC-BY-NC ( No Commercial ) license.

Use these data to build awesome open source applications and researches like Spellchekers, grammar checkers, LLm, RAG, what not?

Data is always the oil. Let us grow the open data oil.

Please share all your text, audio, video content in sharable license like creative commons. They will use to build a better future.

Postgres - Session 02 ( Architecture )

  • Who uses the DB frequently ? Application.

Lets deep dive into Architecture

Post Master

  • ஒரு specific portல incoming requestஅ collect பண்ணி வச்சிக்கும். It just re-direct.

Image description

  • Backend process pool uses RAM & CPU so we can't give any number. It will assigned with a specific value , if all are used then Backend process pool will ask the request to wait until the other request gets completed. Each Backend process pool as separate VIRTUAL memory. it won't all data , but specific information related to request. Even Backend process pool won't execute the request.

Image description

  • Now Backend process pool ( BP ) will re-direct to Backend workers pool.
  • Eg BP - 100 & BW - 100.
  • Status of these gets changed like ACTIVE , IDEAL , etc
  • Even this Backend workers pool will have separate MEMORY. ( keep in mind )

Query example

  • Take an example of the below one,

select * from employee where first_name=john ;

  • It will go to the Backend workers pool , then " select * from employee " goes and search in SHARED BUFFER. If its there then it returns the response.
  • If the result is not available in SHARED BUFFER , then it goes to DISK.
  • Here the catch is , even the SHARED BUFFER size is limited. This can be configured.
  • For eg., இப்ப அந்த SHARED BUFFERல 10 mb size allocate பண்ணி இருக்கு , அடுத்த request உள்ள வருது , SHARED BUFFERல இந்த request இல்ல so it goes to DISK and then gets the output of 20 mb file size . இப்ப ஏற்கனவே இருந்த 10 mb size file will be erased and new 20 mb file will be saved in that SHARED BUFFER.
  • It will be very fast when the output comes from SHARED BUFFER.
  • WORKER Memory will take the order by , sort , etc other than select ( main query ).

Image description

Auxiliary Process

WAL WRITER - 1

  • Write Ahead Log
  • If will take all the backup of the request and the query.
  • It as separate buffer space.
  • It will also go to DISK.
  • The command comes to WAL from two places 1. BACKEND WORKER 2. BACKGROUND WRITER.

DIRTY PAGES

  • When the output gets stored in the SHARED BUFFER , at the same time a DIRTY PAGES gets created. why it gets created ? Eg., If there is any update in the query which is stored in SHRED BUFFER , then it updates but this is not saved in DISK.
  • If the update or any operation didn't go to main DISK , then it will create a DIRTY PAGES.
  • At the same time BACKGROUND WRITER , takes the notes of DIRTY PAGES and asks WAL WRITER to take a not of it.
  • The uncommitted task still goes to OS FILE SYSTEM from DIRTY PAGES . Note : still the commit didn't reach the MAIN DISK.

WAL WRITER - 2

  • If there is any crash in SHARED BUFFER , then we can get it from WAL WRITER.
  • Data won't be recovered but the query can be recovered for the WAL.

WAL ARCHIVER

  • WAL WRITER will put all details to WAL ARCHIVER.

CHECK-POINTER

  • Whatever comes to OS FILE SYSTEM , CHECK-POINTER will take care to make sure that it reaches the DISK properly.

AUTOVACCUM

  • PostgreSQL works in UPPEND method , it takes a clone of the table and then it updates.
  • PostgreSQL is Multi-version control.
  • It will clean all the STALE process.
  • Eg.,

Arun , 24 , 900 - STALE
Raj , 23 , 901
There is an update in Arun Mark ,
Arun , 24 , 910

  • இதுல Arunகு மட்டும் updation இருக்கு but இதில் update பண்ணும் போது தனியா ஒரு row create பண்ணி like duplicate and then update takes place.
  • AUTOVACCUM will clean all STALE rows.

STARTUP PROCESS

  • It will start at first and gets all data from WAL Archive Folder , then only Postgres allows the request.
  • Why we need this startup process ? So that all got closed properly.
  • This will happen before the "Post Master".

Replica

  • Standby unit.
  • Its like clone OR Master & Slave concept.
  • It receives all details from WAL sender.

Image description

Image description

Image description

Image description

Image description

Image description

Notes

Postgres - Session 01

Installation over Oracle Linux

  • Version of the Linux Distro which I am using ,

cat /etc/os-release
NAME="Oracle Linux Server"
VERSION="7.9"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.9"
PRETTY_NAME="Oracle Linux Server 7.9"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:7:9:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://github.com/oracle/oracle-linux"
ORACLE_BUGZILLA_PRODUCT="Oracle Linux 7"
ORACLE_BUGZILLA_PRODUCT_VERSION=7.9
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=7.9

  • Install the repository RPM,

sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

Image description

Image description

  • Install PostgreSQL,

sudo yum install -y postgresql13-server

Image description

Optionally initialize the database and enable automatic start

sudo /usr/pgsql-13/bin/postgresql-13-setup initdb [TBD]
sudo systemctl enable postgresql-13
sudo systemctl start postgresql-13

Image description

initdb

Image description

  • This file will create all DB related files , directories & Configuration files.

Image description

  • Also creation of system catalogs --> what is this ? --> It will store all indexes , roles , etc. --> you can see list of catalogue using below statement.

postgres=# select relname from pg_catalog.pg_class where relkind='r';

Image description

sudo -i -u postgres psql

Image description

  • By default a user , a role and a database will be created.
  • All the above 3 will be created in same name "postgres".
  • Then how to check the roles ?

postgres=# \du

Image description

  • Now see the list of databases, ( these 3 are created by postgres by default with the help of template1.

postgres=# \l

Image description

  • If you try to create a new DB , then it will follow the "template1", its like the blueprint.
  • Then why template0 ? --> you can edit template1 but not template0 ( you can't change and its not editable ).
  • so in short template0 --> NOT EDITABLE & template1 --> EDITABLE.

WAL

  • WRITE AHEAD LOG.
  • If the database gets crashed , with this help we can retrieve the data.

Roles

Image description

  • Librarian , Library member and visitor all these are roles.

Image description

  • Lets create a dummy role and test ,

Image description

  • You can see "Cannot login" , that means you can't login as it's not assigned.

Image description

  • creating one more ROLE,

Image description

  • Assigning the permissions or roles,

Image description

  • Here you can't see 'CANNOT LOGIN'.
  • If a ROLE can login then he/she is the user.
  • User can inherits the parents also.

How it works ?

Postmaster

  • All connection request goes first here. ( whatever client ask , server will respond )
  • Follows client server architecture.

WAL Writer

  • It will be write in a notebook , to keep the track of all.

Checkpoint

  • It periodically checks and responed.

Auto Vaccum

  • It periodically checks.

Stats Collector

  • Gathers all stats Eg., if a table is getting more request then it will collect the information and try to keep it in cache to respond very fast and also check whether index is there for the table.

Archiver

  • To backup the data.

Reference

https://www.postgresql.org/download/linux/redhat/ --> This site will provide steps to install in different distro's.

Notes

  1. "template0" is called pristine.

Questions

  1. what command initialize a PostgresSQL database cluster? initdb
  2. which role is automatically created during PostgresSQL installation ? postgres
  3. what is the purpose of template0 in PostgresSQL ? Pristine , unmodified template.
  4. which PostgresSQL template database is used as the default for creating new databases ? template1
  5. how to connect PostgresSQL interactive terminal ? psql
  6. List all databases ? \l
  7. Architecture of PostgresSQL ? Client Server

Collecting content for LLM dataset – Part 3 – Thamizh_Mann books, project madurai, WikiSource

We are collecting open licensed dataset in tamil language, to build LLM, and other interesting applications in the coming days.

The ML models we build may have very short lifespan, but the open data will be there forever or at least for longer time than our life time.

Check the efforts part 1 and part 2 here.

part 1 – https://goinggnu.wordpress.com/2024/06/11/collecting-content-for-llm-dataset-part-1-tamil-wikipedia-content/

part 2 – https://goinggnu.wordpress.com/2024/06/16/collecting-content-for-llm-dataset-part-2-freetamilebooks/

here goes part 3.

Thamizh_mann publishers are publishing the public domain and nationalized tamil books for many years. Few years ago, with a collaboration with the Library at University of Toronto, Scarborough, Canada, and Thamizh_mann publishers, the kaniyam foundation team helped to release all the 1000+ tamil books as PDF and Docx formats for free online.

You can download them all here https://tamil.digital.utsc.utoronto.ca/61220/utsc35335

Thanks to UTSC, Thamizh_mann team for the great gift for the tamil Diaspora.

Now, we have 1000+ books in Unicode Docx format. Next is to convert them all as PlainText and use them. Natkeeran and Parathan helped on this.

Along with this, they helped to scrap project madurai books and tamil WikiSource books. They published all in a git repo here – https://github.com/KaniyamFoundation/open_tamil_texts along with the scripts and metadata.

I am adding those text in our open licensed tamil data collection.

Download them all here https://kaniyam.cloudns.nz/tamil_datasets/

here is the current size in text format and compressed format.

shrini@dell-optiplex-9100 v/w/h/tamil_datasets> du -h compressed
258M compressed/

shrini@dell-optiplex-9100 v/w/h/tamil_datasets> du -h text-files
355M text-files/project_madurai/data/text
355M text-files/project_madurai/data
355M text-files/project_madurai
110M text-files/tamil_wikisource/data
110M text-files/tamil_wikisource
374M text-files/FreeTamilEbooks-txt
714M text-files/thamizh_mann/data
716M text-files/thamizh_mann
1.6G text-files/

We have 1.6 G of text data to work on LLM or other works.

Go ahead, use it and build more models and tools using this data.

Hope this may not enough to get any good output. But, if we can bring something out of this, even though they are not good, then we can ask people to release their recent contents, blogs, social media posts in creative commons license.

There are few bloggers, magazines are already released their content in CC license. Now, we need your help to scarp them. If you know any programming language and can help for this project, please do webscrapping for the websites mentioned here. share the data and code.

https://github.com/KaniyamFoundation/ProjectIdeas/issues/198

Thanks for all the content providers and the contributors.

Collecting content for LLM dataset – Part 3 – Thamizh_Mann books, project madurai, WikiSource

We are collecting open licensed dataset in tamil language, to build LLM, and other interesting applications in the coming days.

The ML models we build may have very short lifespan, but the open data will be there forever or at least for longer time than our life time.

Check the efforts part 1 and part 2 here.

part 1 – https://goinggnu.wordpress.com/2024/06/11/collecting-content-for-llm-dataset-part-1-tamil-wikipedia-content/

part 2 – https://goinggnu.wordpress.com/2024/06/16/collecting-content-for-llm-dataset-part-2-freetamilebooks/

here goes part 3.

Thamizh_mann publishers are publishing the public domain and nationalized tamil books for many years. Few years ago, with a collaboration with the Library at University of Toronto, Scarborough, Canada, and Thamizh_mann publishers, the kaniyam foundation team helped to release all the 1000+ tamil books as PDF and Docx formats for free online.

You can download them all here https://tamil.digital.utsc.utoronto.ca/61220/utsc35335

Thanks to UTSC, Thamizh_mann team for the great gift for the tamil Diaspora.

Now, we have 1000+ books in Unicode Docx format. Next is to convert them all as PlainText and use them. Natkeeran and Parathan helped on this.

Along with this, they helped to scrap project madurai books and tamil WikiSource books. They published all in a git repo here – https://github.com/KaniyamFoundation/open_tamil_texts along with the scripts and metadata.

I am adding those text in our open licensed tamil data collection.

Download them all here https://kaniyam.cloudns.nz/tamil_datasets/

here is the current size in text format and compressed format.

shrini@dell-optiplex-9100 v/w/h/tamil_datasets> du -h compressed
258M compressed/

shrini@dell-optiplex-9100 v/w/h/tamil_datasets> du -h text-files
355M text-files/project_madurai/data/text
355M text-files/project_madurai/data
355M text-files/project_madurai
110M text-files/tamil_wikisource/data
110M text-files/tamil_wikisource
374M text-files/FreeTamilEbooks-txt
714M text-files/thamizh_mann/data
716M text-files/thamizh_mann
1.6G text-files/

We have 1.6 G of text data to work on LLM or other works.

Go ahead, use it and build more models and tools using this data.

Hope this may not enough to get any good output. But, if we can bring something out of this, even though they are not good, then we can ask people to release their recent contents, blogs, social media posts in creative commons license.

There are few bloggers, magazines are already released their content in CC license. Now, we need your help to scarp them. If you know any programming language and can help for this project, please do webscrapping for the websites mentioned here. share the data and code.

https://github.com/KaniyamFoundation/ProjectIdeas/issues/198

Thanks for all the content providers and the contributors.

SQL Loader

  • Its nothing but " Bulk Loader Utility ".
  • With this concept we can load the data to the table in bulk.
  • Main word is LOAD.
  • Then comes to your mind , what is the difference between load and insert ? Insert happens one by one.Load happens in one go.
  • what data ? which table ? loading script ? Execute --> these are the four things YOU NEED TO KEEP IN MIND.

Image description

  • Flat files --> csv ( comma separated value ) , txt , dat , excel , etc.
  • Always use notepad to load the data.
select employee_id || ',' || first_name || ',' || salary from employees_table where rownum <= 10 ; --> this will fetch only 10 rows.
  • save this file in a folder as csv.

Image description

select employee_id || ',' || first_name || ',' || salary from employees_table where employee_id between 150 and 170 ; --> this will fetch rows between those values.
  • Save this file as txt.

Now coming to table creation

create table sample(id number , name varchar2(25) , salary number);

Now coming to creation of script

  • loading script or control file both are same.

load data infile 'path_of_the_file.csv'
infile 'path_of_the_file.txt'
insert into table sample
fields terminated by ','
(id,name,salary)

  • create the script and save as ALL FILES ( notepad ) with .ctl file.

Now coming to Execute

sqlldr hr_schema_name/password control='file_location_of_control_file_or_execution_file' direct = true

  • here why direct=true --> it will load very fast and it will by-pass all constraints and triggers.
  • if direct=false --> constraints and triggers it will check and then it execute.

Image description

  • In short ,

Image description

Excluding one column

  • If you some column should not be loaded , then use FILLER.

load data infile 'path_of_the_file.csv'
infile 'path_of_the_file.txt'
insert into table sample
fields terminated by ','
(id,name,salary filler)

load data infile 'path_of_the_file.csv'
infile 'path_of_the_file.txt'
insert into table sample
fields terminated by ','
(id,name filler,salary)

  • In above example , salary and name will be empty . It won't load the data.

Condition

  • WHEN --> loading data should obey the condition which you give. If the condition fails , then it stores the failed data in DISCARD FILE.
  • If there is Oracle error , then it gets captured in BAD FILE.

Image description

  • WHEN condition should be used here,

load data infile 'path_of_the_file.csv'
infile 'path_of_the_file.txt'
insert into table sample when ?
fields terminated by ','
(id,name filler,salary)

How to get the process summary ?

  • It will be stored in log file.
  • you can set all the files in the command itself , like below.

sqlldr hr_schema_name/password control='file_location_of_control_file_or_execution_file' log = summary.log bad = sample.bad discard = sample.dsc direct = true

  • If you are giving any file name here , then it will generate automatically.
  • So Import take here is ,

Image description

skip

  • If you want to skip the rows while loading , then you can specify in the command itself.

sqlldr hr_schema_name/password control='file_location_of_control_file_or_execution_file' skip = 2 direct = true

  • 2 rows will be skipped.

Notes

  • SQL loader short key word is sqlldr.
  • insert into table sample --> this will work only when the table is EMPTY. If you try to execute again , then it throw below error.

Image description

so you can use ,

load data infile 'path_of_the_file.csv'
infile 'path_of_the_file.txt'
append into table sample
fields terminated by ','
(id,name,salary)

  • Also you use truncate ( it will delete old data and insert new data again )

load data infile 'path_of_the_file.csv'
infile 'path_of_the_file.txt'
truncate into table sample
fields terminated by ','
(id,name,salary)

Task

  1. For a particular column instead of (,) separated it's used as (#) - how to load ?
  2. how to load the excel file ?

Introduction to PostgreSQL database – free online course in Tamil

Introduction to PostgreSQL database – free online course in Tamil

Monday, wednesday, Friday IST evening.

First class – 18-Nov-2024 7-8 PM IST

Syllabus: https://parottasalna.com/postgres-database-syllabus/

Trainer – Syed Jafer – contact.syedjafer@gmail.com

Get the meeting link here

Telegram Group – https://t.me/parottasalna
Whatsapp channel- https://whatsapp.com/channel/0029Vavu8mF2v1IpaPd9np0s Kaniyam Tech events Calendar – https://kaniyam.com/events/

Azure VNET

  • Network --> communications between devices.
  • IP Address --> unique identifier to each device which is internet protocol address.

IPv4

  1. 4th of version of Internet protocol.
  2. 32 bit
  3. Totally 4 blocks with 8 bit segments each. A , B , C & D.
  4. 2 types of IP address.
  5. Public ( Mainly using internet routing ) & Private ( Office )
  6. Range --> 0 to 255
  7. 0 & 255 reserved by system.
  8. 127 --> loop-back address. 253 address we can use.
  9. Then how to find whether its public or private ? By classes.
  10. A, B , C --> Commonly used.
  11. D & E --> Multi-casting & Research purpose.
  12. Class A --> 0 to 127 Public , Private 10 is only used eg., 10.0.0.1 ( 16 million hosts can be declared )
  13. Class B --> 128 to 191 Public , Private 172 is only used eg., 172.16.0.1 to 171.16.255.254 ( 65,536 hosts can be declared ) - Med size networks.
  14. Class C --> 192 to 223 Public , Private 192.168.1.1 small network for 254 hosts.
  15. Class D --> 224 to 239 for multicast groups.
  16. Class E --> 240 - 255 for research purpose.

As a whole , the private range is .

A --> 10.0.0.1
B --> 172.16.0.0
C --> 192.168.0.0

Image description

Subnetting

  • Slashing the network.

Image description

Virtual Network in Azure

  • Software based network connects virtual machines.

Subnet in Azure

  • Subdivison of VNET.
  • we can organise the resources within a network.
  • Features as follows ,

Image description

Azure Portal

All service >> Networking >> Virtual Networks >> Create

Image description

  • Gave a wrong IP , then you can see a prompt is coming.

Image description

  • VNET & Subnet creation

Image description

Image description

Image description

Notes

  1. DNS , DHCP & Gateway , 255 Broadcast --> 4 IP's are reserved.

Introduction to PostgreSQL database – free online course in Tamil

Introduction to PostgreSQL database – free online course in Tamil

Monday, wednesday, Friday IST evening.

First class – 18-Nov-2024 7-8 PM IST

Syllabus: https://parottasalna.com/postgres-database-syllabus/

Trainer – Syed Jafer – contact.syedjafer@gmail.com

Get the meeting link here

Telegram Group – https://t.me/parottasalna
Whatsapp channel- https://whatsapp.com/channel/0029Vavu8mF2v1IpaPd9np0s Kaniyam Tech events Calendar – https://kaniyam.com/events/

❌