日期处理¶
-
mooncake_utils.date.
datediff
(dt, base=None, unit='day')[source]¶ 用来计算分两个datetime的相差天数或者分钟数。
如果不显示的指定base,则计算与当前时间的diff
Parameters: - dt – input date
必须是datetime
- base – 基准日期
- unit – 计算粒度,可选的有
day
,hour
,minute
,second
- dt – input date
-
mooncake_utils.date.
gen_date_list
(begin, end, join=False, exclude=[], exclude_today=True)[source]¶ 生成指定起始到结束之内的日期列表,注意同gen_date_list_by_days对比。
-
mooncake_utils.date.
gen_date_list_by_days
(begin=None, days=7, join=False, exclude=[], include_begin_day=False, format='%Y%m%d')[source]¶ 生成指定的时间list
Parameters: - begin – 起始日期,默认要求datetime类型,如果是str会自动尝试解析成datetime
- days – 生成距离begin的最近几天
- join – 是否将list按照,拼接。hadoop的input目录常用
- exclude – 去除日期。 统计数据时常用,统计最近7天,不过某几天数据损坏需要去掉某几天。
- include_begin_day – 是否包含begin这一天
>>> gen_date_list_by_days(begin ='20170620', days=3) # 今天2017-6-24 ['20170619', '20170618', '20170617']
>>> gen_date_list_by_days(begin ='20170620', days=3, exclude=['20170618']) # 今天2017-6-24 ['20170619', '20170617']
>>> gen_date_list_by_days(days=3) # 今天2017-6-24 ['20170623', '20170622', '20170621']
>>> gen_date_list_by_days(days=3, join = True) # 今天2017-6-24 '20170623,20170622,20170621'
-
mooncake_utils.date.
gen_latest_date_list
(begin, end, join=False, exclude=[])[source]¶ 已废弃,请用gen_date_list
-
mooncake_utils.date.
gen_today
(delta=1, raw=False, short=True, with_time=False, only_time=False)[source]¶ 生成当天的日期
Parameters: - delta – 时间偏移量
- raw – 如果设置为``True``,返回``datetime``类型, 否则返回``str``类型
- short – 若为True, 会返回精简时间,如20170101,否则返回2017-01-01
- with_time – 是否添加时间,否则只返回日期
- only_time – 是否只返回时间,不加日期
>>> gen_today(delta=0, with_time=True) 20170624145222 >>> gen_today(delta=0, with_time=True, short=False) 2017-06-24 15:06:12 >>> gen_today(delta=0, only_time=True, short=False) 15:06:12 >>> gen_today(delta=0, only_time=True) 150612 >>> gen_today(delta=0, with_time=False) 20170624 >>> gen_today(delta=1, with_time=False) 20170623
Cython编译辅助¶
数据相关¶
-
class
mooncake_utils.data.
Counter
[source]¶ Bases:
mooncake_utils.data.Storage
Keeps count of how many times something is added.
>>> c = counter() >>> c.add('x') >>> c.add('x') >>> c.add('x') >>> c.add('x') >>> c.add('x') >>> c.add('y') >>> c['y'] 1 >>> c['x'] 5 >>> c.most() ['x']
-
percent
(key)[source]¶ Returns what percentage a certain key is of all entries. >>> c = counter() >>> c.add(‘x’) >>> c.add(‘x’) >>> c.add(‘x’) >>> c.add(‘y’) >>> c.percent(‘x’) 0.75 >>> c.percent(‘y’) 0.25
-
sorted_items
()[source]¶ Returns items sorted by value.
>>> c = counter() >>> c.add('x') >>> c.add('x') >>> c.add('y') >>> c.sorted_items() [('x', 2), ('y', 1)]
-
-
class
mooncake_utils.data.
Storage
[source]¶ Bases:
dict
类似dict, 但是可以通过.来直接访问: s.a instead of s[‘a’]
>>> o = storage(a=1) >>> o.a 1 >>> o['a'] 1 >>> o.a = 2 >>> o['a'] 2 >>> del o.a >>> o.a Traceback (most recent call last): ... AttributeError: 'a'
-
mooncake_utils.data.
counter
¶ alias of
mooncake_utils.data.Counter
-
mooncake_utils.data.
dictfind
(dictionary, element)[source]¶ Returns a key whose value in dictionary is element or, if none exists, None.
>>> d = {1:2, 3:4} >>> dictfind(d, 4) 3 >>> dictfind(d, 5)
-
mooncake_utils.data.
dictfindall
(dictionary, element)[source]¶ Returns the keys whose values in dictionary are element or, if none exists, [].
>>> d = {1:4, 3:4} >>> dictfindall(d, 4) [1, 3] >>> dictfindall(d, 5) []
-
mooncake_utils.data.
lstrips
(text, remove)[source]¶ removes the string remove from the left of text
>>> lstrips("foobar", "foo") 'bar' >>> lstrips('http://foo.org/', ['http://', 'https://']) 'foo.org/' >>> lstrips('FOOBARBAZ', ['FOO', 'BAR']) 'BAZ' >>> lstrips('FOOBARBAZ', ['BAR', 'FOO']) 'BARBAZ'
-
mooncake_utils.data.
rstrips
(text, remove)[source]¶ - removes the string remove from the right of text
>>> rstrips("foobar", "bar") 'foo'
-
mooncake_utils.data.
storage
¶ alias of
mooncake_utils.data.Storage
-
mooncake_utils.data.
storify
(mapping, *requireds, **defaults)[source]¶ Creates a storage object from dictionary mapping, raising KeyError if d doesn’t have all of the keys in requireds and using the default values for keys found in defaults. For example, storify({‘a’:1, ‘c’:3}, b=2, c=0) will return the equivalent of storage({‘a’:1, ‘b’:2, ‘c’:3}).
If a storify value is a list (e.g. multiple values in a form submission), storify returns the last element of the list, unless the key appears in defaults as a list. Thus:
>>> storify({'a':[1, 2]}).a 2 >>> storify({'a':[1, 2]}, a=[]).a [1, 2] >>> storify({'a':1}, a=[]).a [1] >>> storify({}, a=[]).a []
Similarly, if the value has a value attribute, storify will return _its_ value, unless the key appears in `defaults as a dictionary.
>>> storify({'a':storage(value=1)}).a 1 >>> storify({'a':storage(value=1)}, a={}).a <Storage {'value': 1}> >>> storify({}, a={}).a {}
特征抽取¶
文件操作¶
-
mooncake_utils.file.
mkdirp
(directory)[source]¶ 利用python库来做到shell中的
mkdir -p
好处是不用
os.system()
,避免了fork进程造成的资源浪费。Parameters: directory – 路径
Hadoop相关¶
日志辅助¶
-
mooncake_utils.log.
get_logger
(debug=False, name='mu.log', with_file=False, level=None, wrapper=False, formatter_str='%(threadName)s | %(asctime)s - %(levelname)s - <%(filename)s-%(funcName)s:%(lineno)d> : %(message)s', log_save_path=None, with_console=True, with_kafka=False, kafka_topic=None, kafka_hosts=None, kafka_api_version=(0, 10))[source]¶ Parameters: - debug –
- name –
- with_file –
彩色print辅助类¶
-
mooncake_utils.termcolor.
colored
(text, color=None, on_color=None, attrs=None)[source]¶ Colorize text.
- Available text colors:
- red, green, yellow, blue, magenta, cyan, white.
- Available text highlights:
- on_red, on_green, on_yellow, on_blue, on_magenta, on_cyan, on_white.
- Available attributes:
- bold, dark, underline, blink, reverse, concealed.
- Example:
- colored(‘Hello, World!’, ‘red’, ‘on_grey’, [‘blue’, ‘blink’]) colored(‘Hello, World!’, ‘green’)
报警模块¶
命令行相关¶
-
class
mooncake_utils.cmd.
cmd_builder
(bin_base, conf, pretty=False)[source]¶ -
bin_base
= None¶
-
conf
= None¶
-
pool
= {}¶
-
pretty
= False¶
-
-
mooncake_utils.cmd.
md5
(path)[source]¶ 为文件生成相应的md5sum。
Parameters: path – 需要生成md5的路径,如 ./output/final.dat
执行成功后产出
./output/final.dat.md5
-
mooncake_utils.cmd.
run_cmd
(cmd, debug=False)[source]¶ 运行一个shell命令,并且打印结果。 注意,这里是阻塞运行。
Parameters: - cmd – 需要执行的命令 如
ls -alh
- debug – 如果设置True则不执行cmd,仅打印相关日志
- cmd – 需要执行的命令 如