Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Emacs & Org Mode in Windows

Source: OpenSource.com
Note: I am not an expert in Emacs and Org Mode. But I have always loved exploring tech, coding, and finance. This blog is about my journey of exploring Emacs and Org Mode for Static Site Generation using Windows.

When I first know about “Static Site Generation” as a developer, I was amazed. Normally, we develop websites using HTML, and even when using other frameworks, HTML serves as the foundation. But in Static Site Generation, markup languages like Markdown, Org Mode, and others are used. These are then converted into HTML through scripting languages or tools like Emacs, Hugo, and more. This concept was totally new to me!

To be honest, I was astonished when I first heard about it from my brother, Thanga Ayyanar. He exclusively uses Org Mode to maintain his website and blogs. Inspired by his approach, I wanted to learn, use, and explore this tool for myself, and I like to work on Thanga Ayyanar’s website(golayan.in). I also took the opportunity to work on the UI. However, there was one condition from Thanga Ayyanar. They strictly required that I only use HTML and CSS. It was a slight headache, but a challenge I embraced with love.

I aim to host this project soon, and once I do, I will attach the link in this blog below. The challenge, though, is that while Emacs works wonderfully in Linux, I primarily use Windows due to storage constraints. I am unable to use both Windows & Linux like previously on my laptop. Despite this, my curiosity about Emacs and Org Mode drove me to explore them further.

I initially referred to ChatGPT, Perplexity, and Gemini AI for guidance, but their suggestions weren’t helpful enough. Eventually, I discovered my way forward:

Source: freeCodeCamp.org
  1. Since I am comfortable using VS Code, I downloaded the Org Mode extension for it.
  2. I then downloaded and installed the latest version of Emacs from:

3. To build the site, to convert from org to HTML. I ran the following command:

emacs --script build-site.el

4. Finally, I used Python to serve the site locally:

cd public python -m http.server

Then, I went through the code and began to understand how Emacs & Org work . Then how it converts the Org to HTML. I understand the whole process slightly. But, till I have full awareness of Emacs & Org.

Then, I started working on Thanga Ayyanar’s website’s CSS, striving to make it visually appealing. The UI reflects their name in gold, paired with a dark theme. Since Ayyanar bro prefers dark red & gold, I used it for ::selection. You can check out the code I wrote for the website here: https://github.com/anandsundaramoorthysa/goldayan.github.io

This blog is a brief account of my journey into exploring Emacs, Org Mode, and figuring out how to use them on Windows. Initially, it was somewhat challenging, much like when I began learning Python back in 2022. However, as with any skill, it gradually became easier with time and persistence.

When you feel this content is valuable, follow me for more upcoming Blogs.

Connect with Me:

Collecting content for LLM dataset – Part 3 – Thamizh_Mann books, project madurai, WikiSource

23 November 2024 at 00:34

We are collecting open licensed dataset in tamil language, to build LLM, and other interesting applications in the coming days.

The ML models we build may have very short lifespan, but the open data will be there forever or at least for longer time than our life time.

Check the efforts part 1 and part 2 here.

part 1 – https://goinggnu.wordpress.com/2024/06/11/collecting-content-for-llm-dataset-part-1-tamil-wikipedia-content/

part 2 – https://goinggnu.wordpress.com/2024/06/16/collecting-content-for-llm-dataset-part-2-freetamilebooks/

here goes part 3.

Thamizh_mann publishers are publishing the public domain and nationalized tamil books for many years. Few years ago, with a collaboration with the Library at University of Toronto, Scarborough, Canada, and Thamizh_mann publishers, the kaniyam foundation team helped to release all the 1000+ tamil books as PDF and Docx formats for free online.

You can download them all here https://tamil.digital.utsc.utoronto.ca/61220/utsc35335

Thanks to UTSC, Thamizh_mann team for the great gift for the tamil Diaspora.

Now, we have 1000+ books in Unicode Docx format. Next is to convert them all as PlainText and use them. Natkeeran and Parathan helped on this.

Along with this, they helped to scrap project madurai books and tamil WikiSource books. They published all in a git repo here – https://github.com/KaniyamFoundation/open_tamil_texts along with the scripts and metadata.

I am adding those text in our open licensed tamil data collection.

Download them all here https://kaniyam.cloudns.nz/tamil_datasets/

here is the current size in text format and compressed format.

shrini@dell-optiplex-9100 v/w/h/tamil_datasets> du -h compressed
258M compressed/

shrini@dell-optiplex-9100 v/w/h/tamil_datasets> du -h text-files
355M text-files/project_madurai/data/text
355M text-files/project_madurai/data
355M text-files/project_madurai
110M text-files/tamil_wikisource/data
110M text-files/tamil_wikisource
374M text-files/FreeTamilEbooks-txt
714M text-files/thamizh_mann/data
716M text-files/thamizh_mann
1.6G text-files/

We have 1.6 G of text data to work on LLM or other works.

Go ahead, use it and build more models and tools using this data.

Hope this may not enough to get any good output. But, if we can bring something out of this, even though they are not good, then we can ask people to release their recent contents, blogs, social media posts in creative commons license.

There are few bloggers, magazines are already released their content in CC license. Now, we need your help to scarp them. If you know any programming language and can help for this project, please do webscrapping for the websites mentioned here. share the data and code.

https://github.com/KaniyamFoundation/ProjectIdeas/issues/198

Thanks for all the content providers and the contributors.

Collecting content for LLM dataset – Part 3 – Thamizh_Mann books, project madurai, WikiSource

23 November 2024 at 00:34

We are collecting open licensed dataset in tamil language, to build LLM, and other interesting applications in the coming days.

The ML models we build may have very short lifespan, but the open data will be there forever or at least for longer time than our life time.

Check the efforts part 1 and part 2 here.

part 1 – https://goinggnu.wordpress.com/2024/06/11/collecting-content-for-llm-dataset-part-1-tamil-wikipedia-content/

part 2 – https://goinggnu.wordpress.com/2024/06/16/collecting-content-for-llm-dataset-part-2-freetamilebooks/

here goes part 3.

Thamizh_mann publishers are publishing the public domain and nationalized tamil books for many years. Few years ago, with a collaboration with the Library at University of Toronto, Scarborough, Canada, and Thamizh_mann publishers, the kaniyam foundation team helped to release all the 1000+ tamil books as PDF and Docx formats for free online.

You can download them all here https://tamil.digital.utsc.utoronto.ca/61220/utsc35335

Thanks to UTSC, Thamizh_mann team for the great gift for the tamil Diaspora.

Now, we have 1000+ books in Unicode Docx format. Next is to convert them all as PlainText and use them. Natkeeran and Parathan helped on this.

Along with this, they helped to scrap project madurai books and tamil WikiSource books. They published all in a git repo here – https://github.com/KaniyamFoundation/open_tamil_texts along with the scripts and metadata.

I am adding those text in our open licensed tamil data collection.

Download them all here https://kaniyam.cloudns.nz/tamil_datasets/

here is the current size in text format and compressed format.

shrini@dell-optiplex-9100 v/w/h/tamil_datasets> du -h compressed
258M compressed/

shrini@dell-optiplex-9100 v/w/h/tamil_datasets> du -h text-files
355M text-files/project_madurai/data/text
355M text-files/project_madurai/data
355M text-files/project_madurai
110M text-files/tamil_wikisource/data
110M text-files/tamil_wikisource
374M text-files/FreeTamilEbooks-txt
714M text-files/thamizh_mann/data
716M text-files/thamizh_mann
1.6G text-files/

We have 1.6 G of text data to work on LLM or other works.

Go ahead, use it and build more models and tools using this data.

Hope this may not enough to get any good output. But, if we can bring something out of this, even though they are not good, then we can ask people to release their recent contents, blogs, social media posts in creative commons license.

There are few bloggers, magazines are already released their content in CC license. Now, we need your help to scarp them. If you know any programming language and can help for this project, please do webscrapping for the websites mentioned here. share the data and code.

https://github.com/KaniyamFoundation/ProjectIdeas/issues/198

Thanks for all the content providers and the contributors.

❌
❌