How to identify binary and text files using Python?(如何使用 Python 识别二进制文件和文本文件?)
问题描述
我需要确定目录中哪个文件是二进制,哪个是文本.
I need identify which file is binary and which is a text in a directory.
我尝试使用 mimetypes 但在我的情况下这不是一个好主意,因为它无法识别所有文件 mime,而且我这里有陌生人...我只需要知道,二进制或文本.简单的 ?但是我找不到解决方案...
I tried use mimetypes but it isnt a good idea in my case because it cant identify all files mimes, and I have strangers ones here... I just need know, binary or text. Simple ? But I couldn´t find a solution...
谢谢
推荐答案
谢谢大家,我找到了适合我的问题的解决方案.我在 http://code.activestate.com/recipes/173220/ 和我只改变了一点以适合我.
Thanks everybody, I found a solution that suited my problem. I found this code at http://code.activestate.com/recipes/173220/ and I changed just a little piece to suit me.
它工作正常.
from __future__ import division
import string
def istext(filename):
s=open(filename).read(512)
text_characters = "".join(map(chr, range(32, 127)) + list("
"))
_null_trans = string.maketrans("", "")
if not s:
# Empty files are considered text
return True
if " " in s:
# Files with null bytes are likely binary
return False
# Get the non-text characters (maps a character to itself then
# use the 'remove' option to get rid of the text characters.)
t = s.translate(_null_trans, text_characters)
# If more than 30% non-text characters, then
# this is considered a binary file
if float(len(t))/float(len(s)) > 0.30:
return False
return True
这篇关于如何使用 Python 识别二进制文件和文本文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何使用 Python 识别二进制文件和文本文件?
基础教程推荐
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
- 尝试制作WhatsApp机器人 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
