日期处理

mooncake_utils.date.datediff(dt, base=None, unit='day')[source]

用来计算分两个datetime的相差天数或者分钟数。

如果不显示的指定base,则计算与当前时间的diff

Parameters:
  • dt – input date 必须是datetime
  • base – 基准日期
  • unit – 计算粒度,可选的有 day, hour , minute , second
mooncake_utils.date.datetime2timestamp(date)[source]

datetime格式转时间戳(int)

mooncake_utils.date.gen_date_list(begin, end, join=False, exclude=[], exclude_today=True)[source]

生成指定起始到结束之内的日期列表,注意同gen_date_list_by_days对比。

mooncake_utils.date.gen_date_list_by_days(begin=None, days=7, join=False, exclude=[], include_begin_day=False, format='%Y%m%d')[source]

生成指定的时间list

Parameters:
  • begin – 起始日期,默认要求datetime类型,如果是str会自动尝试解析成datetime
  • days – 生成距离begin的最近几天
  • join – 是否将list按照,拼接。hadoop的input目录常用
  • exclude – 去除日期。 统计数据时常用,统计最近7天,不过某几天数据损坏需要去掉某几天。
  • include_begin_day – 是否包含begin这一天
>>> gen_date_list_by_days(begin ='20170620', days=3) # 今天2017-6-24
['20170619', '20170618', '20170617']
>>> gen_date_list_by_days(begin ='20170620', days=3, exclude=['20170618']) # 今天2017-6-24
['20170619', '20170617']
>>> gen_date_list_by_days(days=3) # 今天2017-6-24
['20170623', '20170622', '20170621']
>>> gen_date_list_by_days(days=3, join = True) # 今天2017-6-24
'20170623,20170622,20170621'
mooncake_utils.date.gen_latest_date_list(begin, end, join=False, exclude=[])[source]

已废弃,请用gen_date_list

mooncake_utils.date.gen_today(delta=1, raw=False, short=True, with_time=False, only_time=False)[source]

生成当天的日期

Parameters:
  • delta – 时间偏移量
  • raw – 如果设置为``True``,返回``datetime``类型, 否则返回``str``类型
  • short – 若为True, 会返回精简时间,如20170101,否则返回2017-01-01
  • with_time – 是否添加时间,否则只返回日期
  • only_time – 是否只返回时间,不加日期
>>> gen_today(delta=0, with_time=True)
20170624145222
>>> gen_today(delta=0, with_time=True, short=False)
2017-06-24 15:06:12
>>> gen_today(delta=0, only_time=True, short=False)
15:06:12
>>> gen_today(delta=0, only_time=True)
150612
>>> gen_today(delta=0, with_time=False)
20170624
>>> gen_today(delta=1, with_time=False)
20170623
mooncake_utils.date.str2date(date)[source]

str解析成datetime, 会尝试各种格式解析

>>> str2date('20170101')
<type 'datetime.datetime'>
>>> str2date('2017-01-01')
<type 'datetime.datetime'>
>>> str2date('2017-01-01 11:11:22')
<type 'datetime.datetime'>
mooncake_utils.date.str2datetime(date, date_format='%Y-%m-%d %H:%M:%S')[source]

固定格式的str转datetime

mooncake_utils.date.timestamp2datetime(timestamp)[source]

时间戳转datetime

Parameters:timestamp – str or int or float
Returns:返回datetime

Cython编译辅助

mooncake_utils.cython_build.build(folder='lib')[source]
mooncake_utils.cython_build.build_pyx(to_build=[], delete=False)[source]

数据相关

class mooncake_utils.data.Counter[source]

Bases: mooncake_utils.data.Storage

Keeps count of how many times something is added.

>>> c = counter()
>>> c.add('x')
>>> c.add('x')
>>> c.add('x')
>>> c.add('x')
>>> c.add('x')
>>> c.add('y')
>>> c['y']
1
>>> c['x']
5
>>> c.most()
['x']
add(n)[source]
least()[source]

Returns the keys with mininum count.

most()[source]

Returns the keys with maximum count.

percent(key)[source]

Returns what percentage a certain key is of all entries. >>> c = counter() >>> c.add(‘x’) >>> c.add(‘x’) >>> c.add(‘x’) >>> c.add(‘y’) >>> c.percent(‘x’) 0.75 >>> c.percent(‘y’) 0.25

sorted_items()[source]

Returns items sorted by value.

>>> c = counter()
>>> c.add('x')
>>> c.add('x')
>>> c.add('y')
>>> c.sorted_items()
[('x', 2), ('y', 1)]
sorted_keys()[source]

Returns keys sorted by value.

>>> c = counter()
>>> c.add('x')
>>> c.add('x')
>>> c.add('y')
>>> c.sorted_keys()
['x', 'y']
sorted_values()[source]

Returns values sorted by value.

>>> c = counter()
>>> c.add('x')
>>> c.add('x')
>>> c.add('y')
>>> c.sorted_values()
[2, 1]
class mooncake_utils.data.Storage[source]

Bases: dict

类似dict, 但是可以通过.来直接访问: s.a instead of s[‘a’]

>>> o = storage(a=1)
>>> o.a
1
>>> o['a']
1
>>> o.a = 2
>>> o['a']
2
>>> del o.a
>>> o.a
Traceback (most recent call last):
  ...
AttributeError: 'a'
mooncake_utils.data.counter

alias of mooncake_utils.data.Counter

mooncake_utils.data.dictfind(dictionary, element)[source]

Returns a key whose value in dictionary is element or, if none exists, None.

>>> d = {1:2, 3:4}
>>> dictfind(d, 4)
3
>>> dictfind(d, 5)
mooncake_utils.data.dictfindall(dictionary, element)[source]

Returns the keys whose values in dictionary are element or, if none exists, [].

>>> d = {1:4, 3:4}
>>> dictfindall(d, 4)
[1, 3]
>>> dictfindall(d, 5)
[]
mooncake_utils.data.lstrips(text, remove)[source]

removes the string remove from the left of text

>>> lstrips("foobar", "foo")
'bar'
>>> lstrips('http://foo.org/', ['http://', 'https://'])
'foo.org/'
>>> lstrips('FOOBARBAZ', ['FOO', 'BAR'])
'BAZ'
>>> lstrips('FOOBARBAZ', ['BAR', 'FOO'])
'BARBAZ'
mooncake_utils.data.parse_float(m, default=0.0)[source]
Parameters:
  • m
  • default
mooncake_utils.data.parse_int(m, default=0)[source]
Parameters:
  • m
  • default
mooncake_utils.data.read_mmap(fn)[source]
Parameters:fn
mooncake_utils.data.rstrips(text, remove)[source]
removes the string remove from the right of text
>>> rstrips("foobar", "bar")
'foo'
mooncake_utils.data.storage

alias of mooncake_utils.data.Storage

mooncake_utils.data.storify(mapping, *requireds, **defaults)[source]

Creates a storage object from dictionary mapping, raising KeyError if d doesn’t have all of the keys in requireds and using the default values for keys found in defaults. For example, storify({‘a’:1, ‘c’:3}, b=2, c=0) will return the equivalent of storage({‘a’:1, ‘b’:2, ‘c’:3}).

If a storify value is a list (e.g. multiple values in a form submission), storify returns the last element of the list, unless the key appears in defaults as a list. Thus:

>>> storify({'a':[1, 2]}).a
2
>>> storify({'a':[1, 2]}, a=[]).a
[1, 2]
>>> storify({'a':1}, a=[]).a
[1]
>>> storify({}, a=[]).a
[]

Similarly, if the value has a value attribute, storify will return _its_ value, unless the key appears in `defaults as a dictionary.

>>> storify({'a':storage(value=1)}).a
1
>>> storify({'a':storage(value=1)}, a={}).a
<Storage {'value': 1}>
>>> storify({}, a={}).a
{}
mooncake_utils.data.strips(text, remove)[source]
removes the string remove from the both sides of text
>>> strips("foobarfoo", "foo")
'bar'
mooncake_utils.data.trunc(f, n=4)[source]
Parameters:
  • f
  • n

特征抽取

文件操作

mooncake_utils.file.mkdirp(directory)[source]

利用python库来做到shell中的 mkdir -p

好处是不用 os.system(),避免了fork进程造成的资源浪费。

Parameters:directory – 路径
mooncake_utils.file.rglob(p)[source]
mooncake_utils.file.rm_folder(path, debug=False)[source]

清空文件夹

Parameters:debug – 若为True,则只打印日志,不执行删除操作。
mooncake_utils.file.safewrite(filename, content)[source]

Writes the content to a temp file and then moves the temp file to given filename to avoid overwriting the existing file in case of errors.

Hadoop相关

日志辅助

class mooncake_utils.log.LogWrapper(logger)[source]
critical(*args)[source]
debug(*args)[source]
error(*args)[source]
exception(*args)[source]
info(*args)[source]
log(level, *args)[source]
set_sep(n)[source]
warning(*args)[source]
mooncake_utils.log.get_logger(debug=False, name='mu.log', with_file=False, level=None, wrapper=False, formatter_str='%(threadName)s | %(asctime)s - %(levelname)s - <%(filename)s-%(funcName)s:%(lineno)d> : %(message)s', log_save_path=None, with_console=True, with_kafka=False, kafka_topic=None, kafka_hosts=None, kafka_api_version=(0, 10))[source]
Parameters:
  • debug
  • name
  • with_file

彩色print辅助类

mooncake_utils.termcolor.colored(text, color=None, on_color=None, attrs=None)[source]

Colorize text.

Available text colors:
red, green, yellow, blue, magenta, cyan, white.
Available text highlights:
on_red, on_green, on_yellow, on_blue, on_magenta, on_cyan, on_white.
Available attributes:
bold, dark, underline, blink, reverse, concealed.
Example:
colored(‘Hello, World!’, ‘red’, ‘on_grey’, [‘blue’, ‘blink’]) colored(‘Hello, World!’, ‘green’)
mooncake_utils.termcolor.cprint(text, color=None, on_color=None, attrs=None, **kwargs)[source]

Print colorize text.

It accepts arguments of print function.

报警模块

命令行相关

class mooncake_utils.cmd.cmd_builder(bin_base, conf, pretty=False)[source]
bin_base = None
build()[source]
conf = None
get(p)[source]
pool = {}
pretty = False
put(p, value=None)[source]
use_all_conf()[source]
mooncake_utils.cmd.gen_cmd(base, params, pretty=False)[source]
mooncake_utils.cmd.isarg(pos)[source]
mooncake_utils.cmd.isint(x)[source]
mooncake_utils.cmd.md5(path)[source]

为文件生成相应的md5sum。

Parameters:path – 需要生成md5的路径,如 ./output/final.dat

执行成功后产出 ./output/final.dat.md5

mooncake_utils.cmd.md5_file(path)[source]
mooncake_utils.cmd.run_cmd(cmd, debug=False)[source]

运行一个shell命令,并且打印结果。 注意,这里是阻塞运行。

Parameters:
  • cmd – 需要执行的命令 如 ls -alh
  • debug – 如果设置True则不执行cmd,仅打印相关日志
mooncake_utils.cmd.run_cmd_noblock(cmd, debug=False)[source]

作用同 run_cmd, 不过这里是非阻塞的。

Parameters:
  • cmd – 需要执行的命令 如 ls -alh
  • debug – 如果设置True则不执行cmd,仅打印相关日志
mooncake_utils.cmd.scp(src, host, target, port=22)[source]
mooncake_utils.cmd.setarg(pos, val)[source]