从一个python程序引发的…

下午闲着从google code上下载了一个豆瓣MP3下载的小程序,准备放到开发机器上运行试试~由于本人之前没有写过python程序~只接触了少量的python web,程序看起来确实很明朗简单。可以把代码片段贴在下面。版权归原作者~
config.py

#coding=utf8

PATH = 'c:/codes/projects/downloads/'

#为防止豆瓣反机器人,请设置自己Cookie,并根据自己的浏览器设置/修改相应headers信息
HEADERS = {
    'Host': '',
    'Referer':'',
    'Cookie': '',
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; rv:9.0.1) Gecko/20100101 Firefox/9.0.1',
    'Connection': 'keep-alive',
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language':'zh-cn,zh;q=0.5',
    'Accept-Charset':'GB2312,utf-8;q=0.7,*;q=0.7',
}

download_mp3.py

#coding=utf8

import json
import urllib2
import os
import re
import search_musician
from config import PATH, HEADERS

URL = 'http://douban.fm/j/mine/playlist?type=n&channel=0&context=channel:0|musician_id:%s&from=mainsite'
headers = HEADERS
PAT_FN = re.compile(r'<decode><!\[CDATA\[(.+?)\]\]></decode>', re.S)


def run(mid, name):
    url =  URL % (mid,) #&r=dbdcadec36'
    headers['Host'] = 'douban.fm'
    headers['Referer'] = url
    req = urllib2.Request( url, headers=headers )
    try:
        text = urllib2.urlopen( req ).read()
        data = json.loads(text)
        songs = data['song']
        print '# of songs:', len(songs)
    except Exception, e:
        print 'Error get songs:', str(e)
        return None

    path = PATH + name + '/'    #path 为 unicode
    if not os.path.isdir( path ):
        os.makedirs( path )
       
    #start downloading
    for i in xrange(0, len(songs)):
        song = songs[i]
        print i, 'downloading... ', song['title']
        if download( song, path ) == -1:
            break


def download(song, path):
    try:
        r = urllib2.urlopen( song['url'] )
    except Exception, e:
        print 'Error download:', str(e)
        print song['title'], song['url']
        return -1
    fn = path + PAT_FN.sub('', song['title']) + '.mp3'
    fw = open(fn, 'wb')
    fw.write(r.read())
    fw.close()
    return 1


if __name__ == '__main__':
    key = raw_input('The Singer:\n')
    mid = search_musician.get_musician_id( key )
    if mid:
        run( mid, key.decode('gbk') )  #选择自己的解码方式
    else:
        print 'no such singer'

search_musician.py

#coding=utf8

import urllib2
import re
import sys
from config import PATH, HEADERS

URL = 'http://music.douban.com/subject_search?search_text=%s&cat=1001'
PAT = re.compile(r'href="http://music.douban.com/musician/(\d+)/"')
headers = HEADERS

def get_musician_id(name):
    url = URL % (name)
    headers['Host'] = 'music.douban.com'
    headers['Referer'] = url
    req = urllib2.Request(url, headers = headers)
    try:
        text = urllib2.urlopen( req ).read()
    except Exception, e:
        print 'Error get_musician_id:', str(e)
        print url
        return None
    res = PAT.search( text )
    if res:
        return res.group(1)
    return None


if __name__ == '__main__':
    mid = get_musician_id( '曲婉婷' )
    print mid

===================@@======================
看着这个config的配置文件的路径就知道这个文件肯定是原作者写在windows下面的~但是我本地的机器为了保持纯洁性就不再安装python运行环境~我把它放到SWS的机器上。
先看看我的机器的python配置

[root@mingming-dev src]# python
Python 2.6.6 (r266:84292, Dec  7 2011, 20:48:22)
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

看起来还不错~不是最原始的2.4版本哦~
下面开始编译py文件到pyc,

[root@mingming-dev src]# ll
total 12
-rw-r--r-- 1 root root  538 Aug 21 10:52 config.py
-rw-r--r-- 1 root root 1673 Aug 21 10:50 download_mp3.py
-rw-r--r-- 1 root root  776 Oct  8 16:17 search_musician.py
[root@mingming-dev src]# python -m py_compile download_mp3.py
[root@mingming-dev src]# ls
config.py  download_mp3.py  download_mp3.pyc  search_musician.py
[root@mingming-dev src]# python download_mp3.pyc
The Singer:
周杰伦
Error get_musician_id: HTTP Error 403: Forbidden
http://music.douban.com/subject_search?search_text=周杰伦&cat=1001
no such singer

到这里就能看到已经编译完成了,现在看一下文件

[root@mingming-dev src]# ll
total 24
-rw-r--r-- 1 root root  538 Aug 21 10:52 config.py
-rw-r--r-- 1 root root  578 Oct  8 16:28 config.pyc
-rw-r--r-- 1 root root 1673 Aug 21 10:50 download_mp3.py
-rw-r--r-- 1 root root 2021 Oct  8 16:28 download_mp3.pyc
-rw-r--r-- 1 root root  776 Oct  8 16:17 search_musician.py
-rw-r--r-- 1 root root 1104 Oct  8 16:28 search_musician.pyc

但是从运行的结果可以看到好像还没有成功,是在前面出现了问题,在抓取歌曲mid的时候貌似被403拒绝访问了。