Download file from web in Python 3(在 Python 3 中从 Web 下载文件)
问题描述
I am creating a program that will download a .jar (java) file from a web server, by reading the URL that is specified in the .jad file of the same game/application. I'm using Python 3.2.1
I've managed to extract the URL of the JAR file from the JAD file (every JAD file contains the URL to the JAR file), but as you may imagine, the extracted value is type() string.
Here's the relevant function:
def downloadFile(URL=None):
import httplib2
h = httplib2.Http(".cache")
resp, content = h.request(URL, "GET")
return content
downloadFile(URL_from_file)
However I always get an error saying that the type in the function above has to be bytes, and not string. I've tried using the URL.encode('utf-8'), and also bytes(URL,encoding='utf-8'), but I'd always get the same or similar error.
So basically my question is how to download a file from a server when the URL is stored in a string type?
If you want to obtain the contents of a web page into a variable, just read the response of urllib.request.urlopen:
import urllib.request
...
url = 'http://example.com/'
response = urllib.request.urlopen(url)
data = response.read() # a `bytes` object
text = data.decode('utf-8') # a `str`; this step can't be used if data is binary
The easiest way to download and save a file is to use the urllib.request.urlretrieve function:
import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
urllib.request.urlretrieve(url, file_name)
import urllib.request
...
# Download the file from `url`, save it in a temporary directory and get the
# path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable:
file_name, headers = urllib.request.urlretrieve(url)
But keep in mind that urlretrieve is considered legacy and might become deprecated (not sure why, though).
So the most correct way to do this would be to use the urllib.request.urlopen function to return a file-like object that represents an HTTP response and copy it to a real file using shutil.copyfileobj.
import urllib.request
import shutil
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
If this seems too complicated, you may want to go simpler and store the whole download in a bytes object and then write it to a file. But this works well only for small files.
import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
data = response.read() # a `bytes` object
out_file.write(data)
It is possible to extract .gz (and maybe other formats) compressed data on the fly, but such an operation probably requires the HTTP server to support random access to the file.
import urllib.request
import gzip
...
# Read the first 64 bytes of the file inside the .gz archive located at `url`
url = 'http://example.com/something.gz'
with urllib.request.urlopen(url) as response:
with gzip.GzipFile(fileobj=response) as uncompressed:
file_header = uncompressed.read(64) # a `bytes` object
# Or do anything shown above using `uncompressed` instead of `response`.
这篇关于在 Python 3 中从 Web 下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:在 Python 3 中从 Web 下载文件
基础教程推荐
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- 尝试制作WhatsApp机器人 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
