Parse French dates on a en_US machine

Immagine you work in France, but you are really fond of your good old en_US locales. I'm sure one day you would invariably face the task to use python to play with some french text. I just find out that this can't be easier. You just need to set create and set the correct locales for your python script and voila' !

In this case I need to parse a french date to build an ical file. First, if you haven't already done it for other reasons, you should rebuild your locales and select a freench encoding, for example fr_FR.UTF-8.

On debian , this is just one command away :  sudo dpkg-reconfigure locales

Now you are ready to play :

import locale, datetime
#locale.setlocale(locale.LC_TIME, 'fr_FR.ISO-8859-1')
locale.setlocale(locale.LC_TIME, 'fr_FR.UTF-8')

date_from = "Dimanche 3 Juin 2012"
DATETIME_FORMAT = "%A %d %B %Y"
d = datetime.datetime.strptime(date_from, DATETIME_FORMAT)
print d

Update

If you want to set the date for a particular time zone, this is equally easy once you discover how to do it with standard library function. At the end of the previous snippet add :

from dateutil.tz import *
d = d.replace(tzinfo=gettz('Europe/Paris'))

This is the script I was working on. It uses the vobject library to generate ical files and itertools.groupby to parse the input file.

import vobject
from itertools import groupby
import re
import string
from dateutil.tz import *

import locale, datetime
locale.setlocale(locale.LC_TIME, 'fr_FR.UTF-8')

def test(line) :
    if re.match("^Dimanche.*\n$",line) is not None :
        return True
    else :
        return False

l = []
with open("example") as f :
    for key, group in groupby(f, test):
        if key :
            a = list(group)
        else :
            l.append(a+list(group))

DATETIME_FORMAT = "%A %d %B %Y "

cal = vobject.iCalendar()

for ev in l :
    date_from = ev[0]
    d = datetime.datetime.strptime(date_from, DATETIME_FORMAT)
    d = d.replace(tzinfo=gettz('Europe/Paris'))

    vevent = cal.add('vevent')
    vevent.add('categories').value = ["test category"]
    vevent.add('dtstart').value = d.replace(hour=15)
    vevent.add('dtend').value = d.replace(hour=18)
    vevent.add('summary').value = unicode("Test event")
    vevent.add('description').value = unicode(string.join(ev[1:]),encoding='UTF')

icalstream = cal.serialize()
print icalstream

Input :

Dimanche 6 Mai 2012
       
- text text
- more text
 
Dimanche 13 Mai 2012
       
- text text
- more text
 
Dimanche 3 Juin 2012
       
- text text
- more text
 
Dimanche 10 Juin 2012
       
- text text
- more text

Output

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//PYVOBJECT//NONSGML Version 1//EN
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T020000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:20111123T133829Z-53948@zed
DTSTART;TZID=CET:20120506T150000
DTEND;TZID=CET:20120506T180000
CATEGORIES:test category
DESCRIPTION:        \n - text text\n - more text\n  \n
SUMMARY:Test event
END:VEVENT
BEGIN:VEVENT
UID:20111123T133829Z-19906@zed
DTSTART;TZID=CET:20120513T150000
DTEND;TZID=CET:20120513T180000
CATEGORIES:test category
DESCRIPTION:        \n - text text\n - more text\n  \n
SUMMARY:Test event
END:VEVENT
BEGIN:VEVENT
UID:20111123T133829Z-70980@zed
DTSTART;TZID=CET:20120603T150000
DTEND;TZID=CET:20120603T180000
CATEGORIES:test category
DESCRIPTION:        \n - text text\n - more text\n  \n
SUMMARY:Test event
END:VEVENT
BEGIN:VEVENT
UID:20111123T133829Z-44400@zed
DTSTART;TZID=CET:20120610T150000
DTEND;TZID=CET:20120610T180000
CATEGORIES:test category
DESCRIPTION:        \n - text text\n - more text\n \n
SUMMARY:Test event
END:VEVENT
END:VCALENDAR

Average: 3 (5 votes)

Comments

You should be careful with

You should be careful with this, as locale changes are global.

you mean global to the script

you mean global to the script ? Or global for the parent process ?

The locale is process state I

The locale is process state I believe (I don't really want to think about how it interacts with threads; programs are expected to call setlocale(LC_ALL, '') at startup when they're still singe-threaded, then to forget about it). Babel is available to do transactions without being tied to any state; that makes Babel useful in web frameworks for example.