首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 开发语言 > perl python >

python模块引见- collections (1) Counter计数器

2013-03-22 
python模块介绍- collections (1) Counter计数器python模块介绍- collections2013-03-20 磁针石#承接软件

python模块介绍- collections (1) Counter计数器
python模块介绍- collections

2013-03-20 磁针石

#承接软件自动化实施与培训等gtalk:ouyangchongwu#gmail.com qq 37391319 博客:http://blog.csdn.net/oychw

#版权所有,转载刊登请来函联系

# 深圳测试自动化python项目接单群113938272深圳会计软件测试兼职 6089740

#深圳地摊群 66250781武冈洞口城步新宁乡情群49494279

#自动化测试和python群组: http://groups.google.com/group/automation_testing_python

#参考资料:《The Python Standard Library by Example 2011》

1.3 collections-容器数据类型

主要类型如下:

namedtuple()。创建有名字域的元组子类的工厂函数。python 2.6新增。

deque:双端队列,类似于列表,两端进栈和出栈都比较快速。python 2.4新增。

Counter:字典的子类,用于统计哈希对象。python 2.7新增。

OrderedDict:字典的子类,记录了添加顺序。python 2.7新增。

defaultdict:dict的子类,调用一个工厂函数支持不存在的值。python 2.5新增。

还提供了抽象基类,用来测试类是否提供了特殊接口,比如是哈希或者映射。

1.3.1 Counter

计数器(Counter)是一个容器,用来跟踪值出现了多少次。和其他语言中的bag或multiset类似。

计数器支持三种形式的初始化。构造函数可以调用序列,包含key和计数的字典,或使用关键字参数。

importcollections

 

printcollections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])

printcollections.Counter({'a':2, 'b':3, 'c':1})

printcollections.Counter(a=2, b=3, c=1)

执行结果:

#./collections_counter_init.py

Counter({'b':3, 'a': 2, 'c': 1})

Counter({'b':3, 'a': 2, 'c': 1})

Counter({'b':3, 'a': 2, 'c': 1})

注意key的出现顺序是根据计数的从大到小。

可以创建空的计数器,再update:

importcollections

 

c =collections.Counter()

print'Initial :', c

 

c.update('abcdaab')

print'Sequence:', c

 

c.update({'a':1,'d':5})

print'Dict    :', c

执行结果:

#./collections_counter_update.py

Initial: Counter()

Sequence:Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1})

Dict    : Counter({'d': 6, 'a': 4, 'b': 2, 'c': 1})

 

访问计数:

importcollections

 

c =collections.Counter('abcdaab')

 

forletter in 'abcde':

    print '%s : %d' % (letter, c[letter])

执行结果:

#./collections_counter_get_values.py

a :3

b :2

c :1

d :1

e :0

elements可以列出所有元素:

importcollections

 

c =collections.Counter('extremely')

c['z']= 0

printc

printlist(c.elements())

执行结果:

#./collections_counter_elements.py

Counter({'e':3, 'm': 1, 'l': 1, 'r': 1, 't': 1, 'y': 1, 'x': 1, 'z': 0})

['e','e', 'e', 'm', 'l', 'r', 't', 'y', 'x']

 

most_common()可以提取出最常用的。

importcollections

 

c =collections.Counter()

withopen('/usr/share/dict/words', 'rt') as f:

    for line in f:

        c.update(line.rstrip().lower())

 

print'Most common:'

forletter, count in c.most_common(3):

    print '%s: %7d' % (letter, count)

执行结果:

#./collections_counter_most_common.py

Mostcommon:

e:  484673

i:  382454

a:  378030

 

Counter还支持算术和集合运算,它们都只会保留数值为正整数的key。

importcollections

 

c1 =collections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])

c2 =collections.Counter('alphabet')

 

print'C1:', c1

print'C2:', c2

 

print'\nCombined counts:'

printc1 + c2

 

print'\nSubtraction:'

printc1 - c2

 

print'\nIntersection (taking positive minimums):'

printc1 & c2

 

print'\nUnion (taking maximums):'

printc1 | c2

执行结果:

#./collections_counter_arithmetic.py

C1:Counter({'b': 3, 'a': 2, 'c': 1})

C2:Counter({'a': 2, 'b': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1})

 

 

Combinedcounts:

Counter({'a':4, 'b': 4, 'c': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1})

 

 

Subtraction:

Counter({'b':2, 'c': 1})

 

 

Intersection(taking positive minimums):

Counter({'a':2, 'b': 1})

 

 

Union(taking maximums):

Counter({'b':3, 'a': 2, 'c': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1})

上面的例子让人觉得collections只能处理单个字符。其实不是这样的,请看标准库中的实例。

>>>from collections import Counter

>>>cnt = Counter()

>>>for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:

...     cnt[word] += 1

...

>>>cnt

Counter({'blue':3, 'red': 2, 'green': 1})

 

>>>cnt = Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])

>>>cnt

Counter({'blue':3, 'red': 2, 'green': 1})

 

>>> import re

>>> words = re.findall('\w+',open('/etc/ssh/sshd_config').read().lower())

>>>Counter(words).most_common(10)

[('yes', 27), ('no', 23), ('to', 12),('the', 9), ('for', 8), ('and', 8), ('protocol', 6), ('ssh', 6), ('default',6), ('this', 6)]

 

第1段代码和第2段的代码效果式样的,后面一段代码通过Counter实现了简单的单词的统计功能。比如面试题:使用python打印出/etc/ssh/sshd_config出现次数最高的10个单词及其出现次数。

下面看看Counter的相关定义:

      classcollections.Counter([iterable-or-mapping]) 。注意Counter是无序的字典。在key不存在的时候返回0.c['sausage'] = 0。设置值为0不会删除元素,要使用delc['sausage']。

除了标准的字典方法,额外增加了:

elements() :返回一个包含所有元素的迭代器,忽略小于1的计数。

most_common([n]):返回最常用的元素及其计数的列表。默认返回所有元素。

subtract([iterable-or-mapping]) :相减。

>>> c =Counter(a=4, b=2, c=0, d=-2)

>>> d =Counter(a=1, b=2, c=3, d=4)

>>> c - d

Counter({'a': 3})

>>> c

Counter({'a': 4,'b': 2, 'c': 0, 'd': -2})

>>> d

Counter({'d': 4,'c': 3, 'b': 2, 'a': 1})

>>>c.subtract(d)

>>> c

Counter({'a': 3,'b': 0, 'c': -3, 'd': -6})

>>> d

Counter({'d': 4,'c': 3, 'b': 2, 'a': 1})

从上面可以看出subtract会对实际的Counter产生作用,负数也会计算在里面。

标准的字典方法,fromkeysCounter中没有实现。Update被重载,实现机制不一样。

常用方式:

sum(c.values())                 # total of all counts

c.clear()                       # reset all counts

list(c)                         # list unique elements

set(c)                          # convert to a set

dict(c)                         # convert to a regulardictionary

c.items()                       # convert to a list of(elem, cnt) pairs

Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs

c.most_common()[:-n:-1]         # n least common elements

c += Counter()                  # remove zero and negativecounts

        数学和交集,并集:

       >>> c= Counter(a=3, b=1)

>>> d = Counter(a=1, b=2)

>>> c + d                       # add two counterstogether:  c[x] + d[x]

Counter({'a': 4, 'b': 3})

>>> c - d                       # subtract (keeping onlypositive counts)

Counter({'a': 2})

>>> c & d                       # intersection:  min(c[x], d[x])

Counter({'a': 1, 'b': 1})

>>> c | d                       # union:  max(c[x], d[x])

Counter({'a': 3, 'b': 2})

        关于运算的说明:

       The Counter class itself is a dictionary subclasswith no restrictions on its keys and values. The values are intended to benumbers representing counts, but you could store anything in the value field.

The most_common() method requires only that the values beorderable.

For in-place operations such as c[key] += 1, the valuetype need only support addition and subtraction. So fractions, floats, anddecimals would work and negative values are supported. The same is also truefor update() and subtract() which allow negative and zero values for bothinputs and outputs.

The multiset methods are designed only for use cases withpositive values. The inputs may be negative or zero, but only outputs withpositive values are created. There are no type restrictions, but the value typeneeds to support addition, subtraction, and comparison.

The elements() method requires integer counts. It ignoreszero and negative counts.

        参考资料:

Counter classadapted for Python 2.5 and an earlyBag recipe for Python 2.4.

Bag classin Smalltalk.

Wikipedia entry for Multisets.

C++ multisetstutorial with examples.

For mathematical operations on multisets and their usecases, see Knuth, Donald. The Art of Computer Programming Volume II, Section4.6.3, Exercise 19.

To enumerate all distinct multisets of a given size overa given set of elements, seeitertools.combinations_with_replacement().

map(Counter, combinations_with_replacement(‘ABC’, 2))–> AA AB AC BB BC CC

热点排行