Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Emacs & Org Mode in Windows

Source: OpenSource.com
Note: I am not an expert in Emacs and Org Mode. But I have always loved exploring tech, coding, and finance. This blog is about my journey of exploring Emacs and Org Mode for Static Site Generation using Windows.

When I first know about “Static Site Generation” as a developer, I was amazed. Normally, we develop websites using HTML, and even when using other frameworks, HTML serves as the foundation. But in Static Site Generation, markup languages like Markdown, Org Mode, and others are used. These are then converted into HTML through scripting languages or tools like Emacs, Hugo, and more. This concept was totally new to me!

To be honest, I was astonished when I first heard about it from my brother, Thanga Ayyanar. He exclusively uses Org Mode to maintain his website and blogs. Inspired by his approach, I wanted to learn, use, and explore this tool for myself, and I like to work on Thanga Ayyanar’s website(golayan.in). I also took the opportunity to work on the UI. However, there was one condition from Thanga Ayyanar. They strictly required that I only use HTML and CSS. It was a slight headache, but a challenge I embraced with love.

I aim to host this project soon, and once I do, I will attach the link in this blog below. The challenge, though, is that while Emacs works wonderfully in Linux, I primarily use Windows due to storage constraints. I am unable to use both Windows & Linux like previously on my laptop. Despite this, my curiosity about Emacs and Org Mode drove me to explore them further.

I initially referred to ChatGPT, Perplexity, and Gemini AI for guidance, but their suggestions weren’t helpful enough. Eventually, I discovered my way forward:

Source: freeCodeCamp.org
  1. Since I am comfortable using VS Code, I downloaded the Org Mode extension for it.
  2. I then downloaded and installed the latest version of Emacs from:

3. To build the site, to convert from org to HTML. I ran the following command:

emacs --script build-site.el

4. Finally, I used Python to serve the site locally:

cd public python -m http.server

Then, I went through the code and began to understand how Emacs & Org work . Then how it converts the Org to HTML. I understand the whole process slightly. But, till I have full awareness of Emacs & Org.

Then, I started working on Thanga Ayyanar’s website’s CSS, striving to make it visually appealing. The UI reflects their name in gold, paired with a dark theme. Since Ayyanar bro prefers dark red & gold, I used it for ::selection. You can check out the code I wrote for the website here: https://github.com/anandsundaramoorthysa/goldayan.github.io

This blog is a brief account of my journey into exploring Emacs, Org Mode, and figuring out how to use them on Windows. Initially, it was somewhat challenging, much like when I began learning Python back in 2022. However, as with any skill, it gradually became easier with time and persistence.

When you feel this content is valuable, follow me for more upcoming Blogs.

Connect with Me:

Collecting content for LLM dataset – Part 3 – Thamizh_Mann books, project madurai, WikiSource

23 November 2024 at 00:34

We are collecting open licensed dataset in tamil language, to build LLM, and other interesting applications in the coming days.

The ML models we build may have very short lifespan, but the open data will be there forever or at least for longer time than our life time.

Check the efforts part 1 and part 2 here.

part 1 – https://goinggnu.wordpress.com/2024/06/11/collecting-content-for-llm-dataset-part-1-tamil-wikipedia-content/

part 2 – https://goinggnu.wordpress.com/2024/06/16/collecting-content-for-llm-dataset-part-2-freetamilebooks/

here goes part 3.

Thamizh_mann publishers are publishing the public domain and nationalized tamil books for many years. Few years ago, with a collaboration with the Library at University of Toronto, Scarborough, Canada, and Thamizh_mann publishers, the kaniyam foundation team helped to release all the 1000+ tamil books as PDF and Docx formats for free online.

You can download them all here https://tamil.digital.utsc.utoronto.ca/61220/utsc35335

Thanks to UTSC, Thamizh_mann team for the great gift for the tamil Diaspora.

Now, we have 1000+ books in Unicode Docx format. Next is to convert them all as PlainText and use them. Natkeeran and Parathan helped on this.

Along with this, they helped to scrap project madurai books and tamil WikiSource books. They published all in a git repo here – https://github.com/KaniyamFoundation/open_tamil_texts along with the scripts and metadata.

I am adding those text in our open licensed tamil data collection.

Download them all here https://kaniyam.cloudns.nz/tamil_datasets/

here is the current size in text format and compressed format.

shrini@dell-optiplex-9100 v/w/h/tamil_datasets> du -h compressed
258M compressed/

shrini@dell-optiplex-9100 v/w/h/tamil_datasets> du -h text-files
355M text-files/project_madurai/data/text
355M text-files/project_madurai/data
355M text-files/project_madurai
110M text-files/tamil_wikisource/data
110M text-files/tamil_wikisource
374M text-files/FreeTamilEbooks-txt
714M text-files/thamizh_mann/data
716M text-files/thamizh_mann
1.6G text-files/

We have 1.6 G of text data to work on LLM or other works.

Go ahead, use it and build more models and tools using this data.

Hope this may not enough to get any good output. But, if we can bring something out of this, even though they are not good, then we can ask people to release their recent contents, blogs, social media posts in creative commons license.

There are few bloggers, magazines are already released their content in CC license. Now, we need your help to scarp them. If you know any programming language and can help for this project, please do webscrapping for the websites mentioned here. share the data and code.

https://github.com/KaniyamFoundation/ProjectIdeas/issues/198

Thanks for all the content providers and the contributors.

Collecting content for LLM dataset – Part 3 – Thamizh_Mann books, project madurai, WikiSource

23 November 2024 at 00:34

We are collecting open licensed dataset in tamil language, to build LLM, and other interesting applications in the coming days.

The ML models we build may have very short lifespan, but the open data will be there forever or at least for longer time than our life time.

Check the efforts part 1 and part 2 here.

part 1 – https://goinggnu.wordpress.com/2024/06/11/collecting-content-for-llm-dataset-part-1-tamil-wikipedia-content/

part 2 – https://goinggnu.wordpress.com/2024/06/16/collecting-content-for-llm-dataset-part-2-freetamilebooks/

here goes part 3.

Thamizh_mann publishers are publishing the public domain and nationalized tamil books for many years. Few years ago, with a collaboration with the Library at University of Toronto, Scarborough, Canada, and Thamizh_mann publishers, the kaniyam foundation team helped to release all the 1000+ tamil books as PDF and Docx formats for free online.

You can download them all here https://tamil.digital.utsc.utoronto.ca/61220/utsc35335

Thanks to UTSC, Thamizh_mann team for the great gift for the tamil Diaspora.

Now, we have 1000+ books in Unicode Docx format. Next is to convert them all as PlainText and use them. Natkeeran and Parathan helped on this.

Along with this, they helped to scrap project madurai books and tamil WikiSource books. They published all in a git repo here – https://github.com/KaniyamFoundation/open_tamil_texts along with the scripts and metadata.

I am adding those text in our open licensed tamil data collection.

Download them all here https://kaniyam.cloudns.nz/tamil_datasets/

here is the current size in text format and compressed format.

shrini@dell-optiplex-9100 v/w/h/tamil_datasets> du -h compressed
258M compressed/

shrini@dell-optiplex-9100 v/w/h/tamil_datasets> du -h text-files
355M text-files/project_madurai/data/text
355M text-files/project_madurai/data
355M text-files/project_madurai
110M text-files/tamil_wikisource/data
110M text-files/tamil_wikisource
374M text-files/FreeTamilEbooks-txt
714M text-files/thamizh_mann/data
716M text-files/thamizh_mann
1.6G text-files/

We have 1.6 G of text data to work on LLM or other works.

Go ahead, use it and build more models and tools using this data.

Hope this may not enough to get any good output. But, if we can bring something out of this, even though they are not good, then we can ask people to release their recent contents, blogs, social media posts in creative commons license.

There are few bloggers, magazines are already released their content in CC license. Now, we need your help to scarp them. If you know any programming language and can help for this project, please do webscrapping for the websites mentioned here. share the data and code.

https://github.com/KaniyamFoundation/ProjectIdeas/issues/198

Thanks for all the content providers and the contributors.

Open Source projects mentoring via IRC

21 October 2024 at 02:46

In the programming world, if you say as ‘ I prefer watching videos, than reading docs’ it means you are a programmer already or you won’t become a programmer.

Do you feel that you are struggling to be a good programmer, even after watching 100s of hours of videos?

Let me share one secret. It is the fear of reading and writing PlainText. The more you go away from reading and writing, programming will go away from you.

Programming is all about dealing with the code, error messages, log files, documentation. All in PlainText. We have emails, tickets, docs, reports too there on the stack of IT life.

If you love terminal and PlainText tools, you are already into reading and writing. The more you read and write, the more you can get clarity in thinking, which is the essential part of programming.

To embrace the simplicity and powers of PlainText, few friends started to discuss in IRC. yes, the same 40+ years old Internet Relay Chat, an chat system which built the internet itself via chat.

Thanks to Indian Linux Users Group, Chennai, KanchiLUG, Kaniyam Foundation friends for joining the chat.

Read my post on why I like IRC here https://goinggnu.wordpress.com/2020/04/14/why-i-like-irc-internet-relay-chat-even-in-2020/

Here is small video in Tamil by my friend Muthuramalingam of Payilagam – https://www.youtube.com/watch?v=CGurYNb0BM8

From today, 7-8 IST evenings, we can discuss at channel at irc.libera.chat

I suggest a terminal based chat client “weechat”

But, for a quick connection, use this link to join and discuss. https://web.libera.chat/gamja/#kaniyam

Start Date – 21 Oct 2024 ( Monday to Friday )
Time – 7-8 pm IST
server – irc.libera.chat
channel –

read the chat logs here – https://ircbot.comm-central.org:8080/kaniyam

join and say something about you.

  • feel like a hacker by chatting with people in your linux terminal
  • get mentored on hactoberfest
  • ask any questions on linux/python/programming/devops
  • share your daily progress on learning and programming
  • practice reading and writing PlainText
  • learn slowly and strongly

See you at IRC.

If you are interested in mentoring students for open source projects, please join and start the discussions.

The other interesting channels that people chat are #ubuntu  you can join there and participate on the discussions anytime.

Open Source projects mentoring via IRC

21 October 2024 at 02:46

In the programming world, if you say as ‘ I prefer watching videos, than reading docs’ it means you are a programmer already or you won’t become a programmer.

Do you feel that you are struggling to be a good programmer, even after watching 100s of hours of videos?

Let me share one secret. It is the fear of reading and writing PlainText. The more you go away from reading and writing, programming will go away from you.

Programming is all about dealing with the code, error messages, log files, documentation. All in PlainText. We have emails, tickets, docs, reports too there on the stack of IT life.

If you love terminal and PlainText tools, you are already into reading and writing. The more you read and write, the more you can get clarity in thinking, which is the essential part of programming.

To embrace the simplicity and powers of PlainText, few friends started to discuss in IRC. yes, the same 40+ years old Internet Relay Chat, an chat system which built the internet itself via chat.

Thanks to Indian Linux Users Group, Chennai, KanchiLUG, Kaniyam Foundation friends for joining the chat.

Read my post on why I like IRC here https://goinggnu.wordpress.com/2020/04/14/why-i-like-irc-internet-relay-chat-even-in-2020/

Here is small video in Tamil by my friend Muthuramalingam of Payilagam – https://www.youtube.com/watch?v=CGurYNb0BM8

From today, 7-8 IST evenings, we can discuss at #kaniyam channel at irc.libera.chat

I suggest a terminal based chat client “weechat”

But, for a quick connection, use this link to join and discuss. https://web.libera.chat/gamja/#kaniyam

Start Date – 21 Oct 2024 ( Monday to Friday )
Time – 7-8 pm IST
server – irc.libera.chat
channel – #kaniyam

read the chat logs here – https://ircbot.comm-central.org:8080/kaniyam

join and say something about you.

  • feel like a hacker by chatting with people in your linux terminal
  • get mentored on hactoberfest
  • ask any questions on linux/python/programming/devops
  • share your daily progress on learning and programming
  • practice reading and writing PlainText
  • learn slowly and strongly

See you at IRC.

If you are interested in mentoring students for open source projects, please join and start the discussions.

The other interesting channels that people chat are #ilugc #dgplug #emacs #kde #ubuntu  you can join there and participate on the discussions anytime.

❌
❌