❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Task: Moving MP3 Files Based on Metadata Date

By: Sakthivel
6 August 2024 at 15:32
import os
import shutil
from datetime import datetime

def list_files_in_folder(folder_path):
    return os.listdir(folder_path)

def get_file_format():
    return input("Enter the file format (e.g., .mp3, .jpg): ")

def get_creation_date(file_path):
    return datetime.fromtimestamp(os.path.getctime(file_path))

def get_user_date():
    date_str = input("Enter the date (YYYY-MM-DD): ")
    return datetime.strptime(date_str, '%Y-%m-%d')

def move_files_based_on_date(folder_path, file_format, user_date, destination_folder):
    if not os.path.exists(destination_folder):
        os.makedirs(destination_folder)
    
    for file_name in list_files_in_folder(folder_path):
        if file_name.endswith(file_format):
            file_path = os.path.join(folder_path, file_name)
            creation_date = get_creation_date(file_path)
            if creation_date.date() == user_date.date():
                shutil.move(file_path, os.path.join(destination_folder, file_name))
                print(f"Moved: {file_name}")

def main():
    folder_path = ("/home/sakthivel/Documents/Practice/task")
    destination_folder = ("/home/sakthivel/Documents/Practice/mp3")
    
    if not os.path.exists(folder_path):
        print("Folder does not exist.")
        return
    
    file_format = get_file_format()
    user_date = get_user_date()
    
    move_files_based_on_date(folder_path, file_format, user_date, destination_folder)

if __name__ == "__main__":
    main()

Detailed Definition:

This Python script automates the task of moving files from one directory to another based on their creation date. The script follows these main steps:

  1. List Files in a Folder:
    • Function: list_files_in_folder(folder_path)
    • Description: This function takes a folder path as an argument and returns a list of all files in that folder.
  2. Get File Format from User:
    • Function: get_file_format()
    • Description: This function prompts the user to enter a file format (e.g., .mp3, .jpg). The entered format is returned as a string.
  3. Get Creation Date of a File:
    • Function: get_creation_date(file_path)
    • Description: This function takes the file path as an argument and returns the creation date of the file as a datetime object.
  4. Get Date from User:
    • Function: get_user_date()
    • Description: This function prompts the user to enter a date in the format YYYY-MM-DD. The entered date is converted to a datetime object and returned.
  5. Move Files Based on Date:
    • Function: move_files_based_on_date(folder_path, file_format, user_date, destination_folder)
    • Description: This function moves files from the source folder to the destination folder based on the specified file format and user-provided date.
      • It first checks if the destination folder exists; if not, it creates it.
      • It then iterates over the files in the source folder, checking if each file matches the specified format and creation date.
      • If a match is found, the file is moved to the destination folder, and a message is printed indicating the file has been moved.
  6. Main Function:
    • Function: main()
    • Description: This is the entry point of the script. It sets the paths for the source and destination folders and performs the following steps:
      • Verifies the existence of the source folder.
      • Retrieves the file format and date from the user.
      • Calls the function to move files based on the provided criteria.
  7. Script Execution:
    • The script is executed by calling the main() function when the script is run directly.

Enhancements for Future Consideration:

  • User Input Validation: Ensure the file format and date inputs are valid.
  • Error Handling: Implement error handling for file operations and user inputs.
  • Logging: Add logging to keep track of the operations performed and any errors encountered.
  • Flexible Date Comparison: Allow for more flexible date comparisons, such as moving files created on or after a specified date.

By following these steps, the script efficiently organizes files based on their creation dates, making it a useful tool for managing large collections of files.

Output:


Using Google Sheets as a makeshift Database [Depriciated]

By: ashish
9 March 2020 at 19:51

Do you want need a quick solution without going into the hassle of setting up a Database? If your answer to any of those questions was a yes, then you’ve come to the right place. This post will show you how you can use Google sheets as your database.

For the purposes of this blogpost I will be usiing this Google sheet.

As you can see, we will be collecting the following data from the user – Name, Email and Age.

Create the API

  • Go to the google sheet you want to use.
  • Create column headers in the first column
  • Click on tools> script editor
  • Copy the following code to the editor

    • Click on run>run function> setup.
    • Now publish your script to get the request URL with the following settings.

Now let us test this URL in a webpage.

See the Pen
Simple register form
by Thomas Ashish Cherian (@pandawhocodes)
on CodePen.

You can enter your details here to see your details being updated in the Google Sheet above( refresh to see changes) .

Collecting content for LLM dataset – Part 2 – FreeTamilEbooks

16 June 2024 at 02:35

At FreeTamilEbooks.com we have published 850 ebooks. All in sharable creative commons license. There are many people asking for the text only content of all these books many times. As it is a big task, took long time for it. Thanks to Lenin, Anwar of Kaniyam Foundation, all the contributors, all the writers and readers for making this project alive and a great success.

We are publishing the books as epub format, along with PDF format. Epub is just a zip file of HTML files. So, we can copy all the content from it as unicode text. Pandoc is a wonderful open source software, which can convert an epub to plaintext file.

There are the list of actions we have to do.

  1. Get URLs of all the 850+ epub files
  2. Download them all.
  3. using pandoc, convert to text file.

So far, we dont have a metadata file for all the books published. Getting the links of all epub files need some programming. As Python is a swiss knife to automate anything, started to explore the wordpress REST api with python to get all the books pages content.

https://github.com/KaniyamFoundation/create_ebooks/blob/master/get_metadata/get_Data.py

Wrote the code here to get all the books info.

This gave a JSON file with book name, author, genre, epub, mobi, a4 pdf,6 inch pdf links.

Converted this to a CSV file with the below code. https://github.com/KaniyamFoundation/create_ebooks/blob/master/get_metadata/parse.py

I had to fix few things manually on the CSV file.

This is the final CSV file. https://github.com/KaniyamFoundation/create_ebooks/blob/master/get_metadata/fte_metadata.csv

The below code is to download all the epub files from their links in the fte_metadata.csv file. Used pandoc to convert to text.

https://github.com/KaniyamFoundation/create_ebooks/blob/master/get_metadata/get_fte_books.py

Got 845 txt files. Total size is 374 MB

Compressed with 7z to get 47MB compressed file.

Published the data here. https://kaniyam.cloudns.nz/tamil_datasets/fte-books/

Download, share the text data for free. Dont sell them as most of the books are released as CC-BY-NC ( No Commercial ) license.

Use these data to build awesome open source applications and researches like Spellchekers, grammar checkers, LLm, RAG, what not?

Data is always the oil. Let us grow the open data oil.

Please share all your text, audio, video content in sharable license like creative commons. They will use to build a better future.

Collecting content for LLM dataset – Part 1 – Tamil wikipedia content

11 June 2024 at 00:00

At Kaniyam Foundation, we have a dream of collecting publishing TerraBytes of Tamil text data for Tamil LLM and other research works. We are documenting the websites that provide Open Licensed tamil content, like Public Domain, Creative Commons license here. https://github.com/KaniyamFoundation/ProjectIdeas/issues/198

From here, we can get the websites, scrap them and use and share the data.

Firstly, Today, I started to explore the tamil wikipedia data.

All the wikepedia content are stored as XML and SQL files here.

Download the Wikipedia dump for the all the languages from http://dumps.wikimedia.org/backup-index.html.

For tamil wikipedia content, from here, https://dumps.wikimedia.org/tawiki/ I downloaded this file

tawiki-20240501-pages-articles-multistream.xml.bz2

it is 223.3 MB

That page has multiple files. But look for β€œpages-articles” to get the main content for wikipedia.

Then, extracted as

bunzip2 tawiki-20240501-pages-articles-multistream.xml.bz2

It gave a file tawiki-20240501-pages-articles-multistream.xml for 1.7 GB

It has a XML file. We have to extract the text content from it.

For it, explored and found a good tool. – https://github.com/apertium/WikiExtractor

Downloaded it and used it.

python3 WikiExtractor.py --infn tawiki-20240501-pages-articles-multistream.xml

It ran for 2 minutes and gave a file wiki.txt for 627 MB. It has all the articles content as a one single big plaintext file.

Compressed it with 7z as it gives better compression.

mv wiki.txt tawiki-20240501-pages-article-wiki.txt
7z a tawiki-20240501-pages-article-text.7z tawiki-20240501-pages-article-wiki.txt

it is 70 MB

Like this, will continue to get plain text tamil data from various sources. We have to find, where we can publish few 100 GBs to TBs of data, for free. Till then, will share these files on my self hosted desktop PC at my home.

Published the file here – https://kaniyam.cloudns.nz/tamil_datasets/

Let me know, if you are interested in joining this project.

❌
❌