我的python食譜 – Paul's Memo Books

在docker安裝pyodbc以連線到MSSQL的步驟

By paul | 2018-05-19 | Comments 0 Comment

在windows上，python要連線到mssql，只需要透過pyodbc，幾乎不用什麼設定，就可以輕鬆連線上mssql

但是在linux上，遇到的坑與血淚，相信前人遇到的已經太多了！

以下記錄一下步驟與眉角：

首先我們先假設已經有一個存在的docker container在運作了，裡面有基本python 3.6的環境(或其他版本，這邊以3.x為主，自行上docker hub找吧…)

連進去container後，有3大工程要施作…

1.安裝freetds

wget  http://ibiblio.org/pub/Linux/ALPHA/freetds/stable/freetds-stable.tgz

tar zxvf freetds-stable.tgz

cd freetds-0.91/

./configure --with-tdsver=7.1 --prefix=/usr/local/freetds0.91 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib64 --without-ldap --without-tcl --enable-pkinit --enable-thread-support --without-hesiod --enable-shared --with-system-et --with-system-ss --enable-dns-for-realm --enable-kdc-lookaside-cache --with-system-verto --disable-rpath --with-pkinit-crypto-impl=openssl --with-openssl

make

make install

 cat >> /usr/local/freetds0.91/etc/freetds.conf
加入
[TestDB]
host = mesotest.database.windows.net
port = 1433
tds version = 7.0

註：freetds.conf 的dump file = /tmp/freetds.log反註解，global的tds版本也要改成7.0一致的版本，有dump log的話，後續連線失敗的話，可以看的到錯誤原因，事半功倍

例： severity:9, dberr:20002[Adaptive Server connection failed], oserr:0[Success] –>tds版本問題，要調整，若8.0不行，就7.2->7.1->7.0往回裝

2.測試freetds連線

/usr/local/freetds0.91/bin/tsql -S TestDB -U [email protected] -P {password} -D test1

若freetds可以連線，也可以查詢的話，應該會像這樣：

可以下sql指令，也回傳的了資料集

2.設定ODBCInit

apt-get install unixodbc-dev
apt-get install python-pip

pip install pyodbc
#yum install gcc-c++

#關鍵中的關鍵
find /usr -name "*\.so" |egrep "libtdsodbc|libtdsS"
 #/usr/lib/libtdsS.so 
 #/usr/local/freetds0.91/lib/libtdsodbc.so

# cp /etc/odbcinst.ini /etc/odbcinst.ini.20160102

# cat >> /etc/odbcinst.ini

[SQL Server]
Description = FreeTDS ODBC driver for MSSQL
Driver = /usr/local/freetds0.91/lib/libtdsodbc.so
Setup = /usr/lib/libtdsS.so
FileUsage = 1

# 檢查一下驅動
# odbcinst -d -q
[SQL Server]

cat >> /etc/odbc.ini
[TESTDSN]
Driver          = SQL Server
Server          = xxx.xxx.xxx.xxx
User            = xxxx
TDS_Version     = 7.0
Port            = 1433

3.執行簡單的python連mssql程式

import pyodbc

conn =  pyodbc.connect("driver={SQL Server};server=mesotest.database.windows.net;PORT=1433 database=test1;[email protected];PWD=%s;TDS_Version=7.0;" % "{yourpassword}" )
cursor = conn.cursor()

query = "select getdate()"

print(query)
cursor.execute(query)
row = cursor.fetchone()
while row:
    print(str(row[0]))
    row = cursor.fetchone()

執行成功，我要哭了…凌晨3點了！！

根據網友們的分享，這裡還有一個很大的坑就是連線字串要包含TDS_Version的資訊，版本要跟freetds內配置的版本一樣…

否則就會陷入無限的…08001輪迴，而不知其所以然…

Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
pyodbc.Error: (‘08001’, ‘[08001] [unixODBC][FreeTDS][SQL Server]Unable to connect to data source (0) (SQLDriverConnect)’)

關鍵2篇REF

https://blog.csdn.net/samed/article/details/50449808

http://www.voidcn.com/article/p-vaxmczdi-dc.html

[Effective Python] 情境：優先使用輔助類別而非使用字典或元組來管理記錄

By paul | 2017-09-22 | Comments 0 Comment

Python的內鍵字典型別(Dictionary Type)對於維護一個物件生命期的動態內部狀態來說，非常useful。

舉例來說，，假如我們有一個銀行帳本，須要記錄客戶的存款與記錄，但我們事先並不知道他們的姓名。

那我們可能設計以下類別，透過字典來將這些名字記錄起來：

class BankBook( object ):
    def __init__(self):
        self._history = {}

    def add_customer(self, name):
        self._history[name] = []

    def save_money(self, name, amount):
        self._history[name].append(amount)

    def draw_money(self, name, amount):
        self._history[name].append(-1 * amount)

    def show_balance(self, name):
        moneys = self._history[name]
        return sum(moneys)

這個類別的使用方式很簡單：


book = BankBook()
book.add_customer('paul')
book.save_money('paul', 1200)

book.save_money('paul', -200)

print(book.show_balance('paul'))

字典因為太容易使用，以至於有過度擴充它們而寫出脆弱code的風險

例如，我們面臨了一個新的需求而要擴充這個類別，誐們希望記錄不同幣別的帳戶，我們可以修改看看，試著將_history也變成字典，用於區分不同的幣別


class CurrencyBankBook( object ):
    def __init__(self):
        self._history = {}

    def add_customer(self, name):
        self._history[name] = {} #原本串列變成字典

    def save_money(self, name, currency_name , amount):
        by_currency = self._history[name]
        currency_book_list = by_currency.setdefault(currency_name, [])
        currency_book_list.append(amount)

    def draw_money(self, name, currency_name, amount):
        self.save_money(name, currency_name, -1 * amount)

    def show_balance(self, name, currency_name):
        if not currency_name:
            raise ValueError("currency_name can't be empty")
        currency_book_list = self._history[name]
        if isinstance(currency_book_list, dict):
            if currency_name in currency_book_list.keys():
                return sum(currency_book_list[currency_name])

        raise ValueError( "currency_name not exist!!" )

這看起來變化不大，只是在於history的處理與查找上增加了一點複雜度，但仍在可以處理的範圍內，使用方式依然很簡單。

book = CurrencyBankBook()
book.add_customer('paul')
book.save_money('paul', 'ntd', 1200)
book.draw_money('paul', 'ntd', 1000)
book.save_money('paul', 'us', 300)
book.draw_money('paul', 'us', 200)

print("ntd-balance:%s" % book.show_balance('paul', 'ntd'))
print("us-balance:%s" % book.show_balance('paul', 'us'))

現在想像一下，如果我們的需求再次改變了，目前我們想要記錄每一次存款的時間點與備註，像網路銀行查詢帳戶記錄的時候，都會有每個交易時間點的餘額資訊

實作這種功能的方式之一就是變更最內層的字典，將幣別映射到的金額使用元組(tuple)來作為值。

我們改寫類別如下：

class AdvCurrencyBankBook( object ):
    def __init__(self):
        self._history = {}

    def add_customer(self, name):
        self._history[name] = {} #原本串列變成字典

    def save_money(self, name, currency_name , amount, time=None, memo=""):
        by_currency = self._history[name]
        currency_book_list = by_currency.setdefault(currency_name, [])
        currency_book_list.append((amount, datetime.now() if time == None else time, memo))

    def draw_money(self, name, currency_name, amount, time=None, memo=""):
        self.save_money(name, currency_name,-1 * amount, time, memo)

    def show_balance(self, name, currency_name, balance_time=datetime(2017, 9, 22, 10, 55, 00)):
        if not currency_name:
            raise ValueError("currency_name can't be empty")
        currency_book_list = self._history[name]
        if isinstance(currency_book_list, dict):
            if currency_name in currency_book_list.keys():
                return sum([amount for amount, datetime, memo in currency_book_list[currency_name] if datetime <= balance_time])
        raise ValueError( "currency_name not exist!!" )

book = AdvCurrencyBankBook()
book.add_customer('paul')
book.save_money('paul', 'ntd', 1200, time=datetime(2017, 9, 21, 10, 55, 00))
book.draw_money('paul', 'ntd', 1000, time=datetime(2017, 9, 22, 10, 55, 00))
book.save_money('paul', 'us', 300, time=datetime(2017, 9, 21, 10, 55, 00))
book.draw_money('paul', 'us', 200, time=datetime(2017, 9, 22, 10, 55, 00))

print("ntd-balance:%s" % book.show_balance('paul', 'ntd', balance_time=datetime(2017, 9, 21, 23, 59, 59)))
print("us-balance:%s" % book.show_balance('paul', 'us', balance_time=datetime(2017, 9, 21, 23, 59, 59)))

雖然對於save_money跟draw_money來說，只是多加一個參數，並把值變成一個元組(tuple)。但現在balance開始要用迴圈開始查找條件資料。這個類別使用方式也更加困難了。

包含位置引數所代表的意義，並不容易弄清楚，例如：book.draw_money(‘paul’, ‘us’, 200, datetime(2017, 9, 22, 10, 55, 00))

當我們看到這樣的複雜情況出現時，就知道該是時候放棄字典與元組了，應是改為使用型別階層架構的類別的時候了

當我們發現記錄工作變得複雜，就將它拆解成多個類別。這讓我們得以提供定義更好的介面，用較好的方式來封裝資料。

重構一下吧！

Effective Python建議我們可以從依存樹(dependency tree)的底部開始改寫為類別，也就是單一次交易，對於這種簡單的資訊來說，使用一個類別似乎太過，而一個元組(tuple)看起來好像比較適當

if isinstance(currency_book_list, dict):
    if currency_name in currency_book_list.keys():
        return sum([amount for amount, datetime, _ in currency_book_list[currency_name] if datetime <= balance_time])

問題在於一般的元組是位置型的指派方式。當我們想要更多的資訊關聯到幣別帳戶，像是我們已經定義的memo備註，就得重新再改寫那個三元組。讓程式碼知有n個項目存在
在此我們可以使用_(底線變數名稱為，來unpack並直接skip掉(撈英文？)

這種將元組擴充得來越長的模式，就類似越來越多層的字典。只要發現元組的長度超過了二元組，就應該考慮namedtuple了
collections模組內的namedtuple可以位置引數或關鍵字引數的方式來建構，讓欄位可以以具名的屬性來存取

import collections
Transaction = collections.namedtuple('Transaction', ('amount', 'time', 'memo'))

def save_money(self, name, currency_name , amount, time=None, memo=""):
		by_currency = self._history[name]
		currency_book_list = by_currency.setdefault(currency_name, [])
		currency_book_list.append(Transaction(amount, datetime.now() if time == None else time, memo))

def show_balance(self, name, currency_name, balance_time=datetime(2017, 9, 22, 10, 55, 00)):
	if not currency_name:
		raise ValueError("currency_name can't be empty")
		currency_book_list = self._history[name]
		total = 0
		if isinstance(currency_book_list, dict):
			for data in currency_book_list[currency_name]:
				if data.time <= balance_time:
					total += data.amount
					return total
		raise ValueError( "currency_name not exist!!" )

namedtuple比tuple更具體，使用上可以寫出較易維護的程式碼，但是仍有其限制

無法為namedtuple類別指定預設的引數。如果資料特性有許多選擇性的特性。那這就會使得它們變的過於笨重。
namedtuple實體的屬性仍然可透過數值的索引與迭代來存取。特別是在對外提供的api中，這可會導致非預期的使用，讓你在之後更能以改寫成一個真正的類別。

from datetime import datetime

import collections

Transaction = collections.namedtuple( 'Transaction', ('amount', 'time', 'memo') )

class Account( object ):
    def __init__(self):
        self._history = []

    def save_money(self, amount, time=None, memo=""):
        self._history.append( Transaction( amount, datetime.now() if time == None else time, memo ) )

    def draw_money(self, amount, time=None, memo=""):
        self.save_money( -1 * amount, time, memo )

    def get_balance(self, balance_time=datetime.now()):
        total = 0
        for transaction in self._history:
            if transaction.time <= balance_time:
                total += transaction.amount
        return total


class Customer( object ):
    def __init__(self):
        self.__account = {}

    def currency_account(self, currency_name):
        if currency_name not in self.__account:
            self.__account[currency_name] = Account()
        return self.__account[currency_name]

    def show_balance(self):
        for name, account in self.__account.items():
            print( "%s-balance:%s" % (name, account.get_balance(datetime.now())) )


class ObjectBankBook( object ):
    def __init__(self):
        self.__customer = {}

    def customer(self, name):
        if name not in self.__customer:
            self.__customer[name] = Customer()
        if isinstance( self.__customer[name], Customer ):
            return self.__customer[name]

這些類別的行數幾乎是之前實作的double，但程式碼比較容易閱讀與維護。

book = ObjectBankBook()
paul = book.customer( 'paul' )
usAccount = paul.currency_account('us')
usAccount.save_money(1200)
usAccount.draw_money(300)

ntdAccount = paul.currency_account('ntd')
ntdAccount.save_money(400)
ntdAccount.draw_money(200)
paul.show_balance()

us-balance:900
ntd-balance:200

註：如果必要，你可以撰寫回溯相容的方法來協助將舊API的使用方式移植到物件階層架構的新方式。

[Effective Python] 情境：考慮使用產生器而非回傳串列

By paul | 2017-09-21 | Comments 1 comment

考慮一種情境，在python中，我們常常會設計一些function來查找串列中match的資料。

例如以下的程式，我們希望找出int串列中，符合特定值的所在位置，若我們按照傳統的寫法，最簡單的就是走訪所有的項目，逐一比對後，再append到result的串列中(看你想放什麼，可以是index或對應的object)

def find_matched_number_location(number, numbers):
    result = []
    if numbers:
        for index, number_in_list in enumerate(numbers):
            if number_in_list == number:
                result.append(index+1)

    return result



result = find_matched_number_location(3, [1, 2, 4, 5, 2, 3, 12, 3, 5, 7, 1])
print(result)

對於某些輸入的樣本來說，這樣能如期的運作

[6, 8]

不過這種函式有兩個問題存在(Effective Python, P41建議)

第1個問題是，這種程式碼有點過於密集，且帶有雜訊。每次符合的條件滿足後，就會呼叫一次append。這樣的呼叫體積大，其中又用了某一行建立結果串列，再一行來return它

第2個問題是，在回傳之前，它對將所有的結果儲存在串列中才行。這對於超大型輸入，可能會使得我們程式耗盡記憶體而當掉。相交之下，這種函式的改成generator版本

能夠輕易地處理任意長度的輸入。

撰寫這種函式比較好的方式是使用generator(產生器)。產生器是使用了yield運算式的函式。被呼叫時，產生器函式實際上還不會執行，而是立即回傳一個iterator(迭代器)

。每次呼叫next()函式時，這個iterator會將產生器到它的下一個yield運算式。因此我們來改寫這個find_matched_number_location函式吧

結果如下：

def find_matched_number_location_new(number, numbers):
    if numbers:
        for index, number_in_list in enumerate(numbers):
            if number_in_list == number:
                yield index + 1

result_new = list(find_matched_number_location_new(3, [1, 2, 4, 5, 2, 3, 12, 3, 5, 7, 1]))
print(result_new)

帶來的好處就是減少了與結果變數互動的所有地方。呼叫generator所回傳的iterator可輕易地被轉成一個串列。

為了突顯第2個問題，我建立了一個3MB的檔案，裡面盡是數字用逗點隔開

在此我定義了一個generator，它會從檔案接受逐行的串流輸入(不過此例只有一行)，一次只產出一個比對數字的輸出。這個函式工作時的最大記憶體量，僅會是單一行輸入的最大長度

def find_matched_number_location_from_file(number, handle):
    offset = 0
    for line in handle:
        if line:
            for number_in_file in line.split(','):
                offset += 1
                if int(number_in_file) == number:
                    yield offset

執行2段的測試

start = time.time()
with open('新增資料夾/test_data.txt', 'r') as f:
    it = find_matched_number_location_from_file(3, f)
    result = list(it)
    print(len(result))
end = time.time()
print( '花費 %.3f 秒' % (end - start) )

start = time.time()
with open('新增資料夾/test_data.txt', 'r') as f:
    result = []
    for line in f:
        for index , number_in_file in enumerate(line.split( ',' )):
            if int(number_in_file) == 3:
                result.append(index + 1)
    print(len(result))
end = time.time()
print('花費 %.3f 秒' % (end - start))

執行結果如下：

307824
花費 0.551 秒
307824
花費 0.730 秒

其實對於我在測試資料僅僅只有3MB的檔案之下，其實秒數落差不大，但是效能上還是有所區別，generator的方法比舊的比對方法快了約25%。而佔的記憶體量，舊的比對方法(用result來記錄)則會多用了了至少307824個數字所站的位元。這邊數字或許不大，但試想若是大型的資料的情況呢？就交由使用場景來決定吧。

最後，再補充一點，定義像這樣的產生器時，唯一要注意的部份就是呼叫者必須知道期回傳的iterator是有狀態的，不能被重複使用。

下篇來分享防備的做法

Reference：Effective Python中文版一書，做法16，改為自己的理解與實作

[CODE WAR記錄] 將Linked-List分成Front,Back2半的Linked-List(難度5)

By paul | 2017-08-23 | Comments 0 Comment

最近有點偷懶，沒有研究新的東西，blog鬧水荒，但是確實對於python的語法使用上還深深感到不足，因此還是來”高手村”練練功好了..，直接把codewar的練習結果與心得當作一篇好了XD

題目示例如下：

var source = 1 -> 3 -> 7 -> 8 -> 11 -> 12 -> 14 -> null
var front = new Node()
var back = new Node()
frontBackSplit(source, front, back)
front === 1 -> 3 -> 7 -> 8 -> null
back === 11 -> 12 -> 14 -> null

請建立frontBackSplit的程式碼

…

Read More Read More

[Code War記錄] 給定2參數：每個數字位數的總合與數字位數長度，找出區間內所有符合連續位數(由小到大)的數字組合

By paul | 2017-08-21 | Comments 0 Comment

之前為了訓練python的語感，去了codewar找題目來練練功，發現了這一題滿有趣的

1.找到所有的數字組合，其每位數的數字加總必須滿足給定的條件值
2.這些數字組合，必須是由小到大連續性的排列組合(例如, 118, 127, 136, 145, 226, 235, 244, 334)

請建立一Function，給定2個參數, x為位數加總的總合，y為預期位數長度，回傳set為3個值，a、b、c，a為滿足的個數，b為滿足的數字中，最小值，c為滿足條件的數字中，最大值

find_all(x, y)

舉例來說：

find_all(10, 3) == [8, 118, 334]
 find_all(27, 3) == [1, 999, 999]

…

Read More Read More

以Dockerfile建置python-app的映象檔

By paul | 2017-08-01 | Comments 0 Comment

繼上一篇：初試啼聲以後，我們希望能以更少的動作來佈署我們的應用程式，當然我們也可以一步一步pull下來python環境，然後連進去做一些環境準備，最後commit回來，後續可以export成tar檔或是push到registery中，達到分享與移動。其實這個時候，我們還可以應用到Dockerfile的機制來”包裝”我們的應用程式，讓剛剛所有的事情都可以自動化做掉 …

Read More Read More