Google App Engine: How to write large files to Google Cloud Storage(Google App Engine:如何将大文件写入 Google Cloud Storage)
问题描述
I am trying to save large files from Google App Engine's Blobstore to Google Cloud Storage to facilitate backup.
It works fine for small files (<10 mb) but for larger files it get gets unstable and GAE throws and FileNotOpenedError.
My code:
PATH = '/gs/backupbucket/'
for df in DocumentFile.all():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private')
with files.open(write_path, 'a') as fp:
while True:
buf = br.read(100000)
if buf=="": break
fp.write(buf)
files.finalize(write_path)
(Runs in a taskeque to avoid exceeding execution time).
Throws a FileNotOpenedError:
Traceback (most recent call last):
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~simplerepository/1.354754771592783168/processFiles.py", line 249, in post
fp.write(buf)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 281, in __exit__
self.close()
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 275, in close
self._make_rpc_call_with_retry('Close', request, response)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry
_make_call(method, request, response)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call
_raise_app_error(e)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 179, in _raise_app_error
raise FileNotOpenedError()
I have investigated further and according to a comment to GAE Issue 5371 the Files API closes the file every 30 seconds. I have not seen this documented anywhere else.
I have tried to work around this by closing and opening the file at intervals but now I get an WrongOpenModeError. The code below is edited from the first version of this post I have added a 0.5 second pause between the close and the open of the file. It now throws a WrongOpenModeError.
My code (updated):
PATH = '/gs/backupbucket/'
for df in DocumentFile.all():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private')
fp = files.open(write_path, 'a')
c = 0
while True:
if (c == 5):
c = 0
fp.close()
files.finalize(write_path)
time.sleep(0.5)
fp = files.open(write_path, 'a')
c = c + 1
buf = br.read(100000)
if buf=="": break
fp.write(buf)
files.finalize(write_path)
Stacktrace:
Traceback (most recent call last):
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~simplerepository/1.354894420907462278/processFiles.py", line 267, in get
fp.write(buf)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 310, in write
self._make_rpc_call_with_retry('Append', request, response)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry
_make_call(method, request, response)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call
_raise_app_error(e)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 188, in _raise_app_error
raise WrongOpenModeError()
I have tried to find information about the WrongOpenModeError but the only place it is mentioned is in the appengine.api.files.file.py itself.
Suggestions on how to get around this and be able to save also large files to Google Cloud storage would be greatly appreciated. Thanks!
I was having the same issue, endup writing an iterator around fetch data and catch the exception, works but is a work-around.
Re-writing your code would be something like:
from google.appengine.ext import blobstore
from google.appengine.api import files
def iter_blobstore(blob, fetch_size=524288):
start_index = 0
end_index = fetch_size
while True:
read = blobstore.fetch_data(blob, start_index, end_index)
if read == "":
break
start_index += fetch_size
end_index += fetch_size
yield read
PATH = '/gs/backupbucket/'
for df in DocumentFile.all():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private')
with files.open(write_path, 'a') as fp:
for buf in iter_blobstore(df.blob):
try:
fp.write(buf)
except files.FileNotOpenedError:
pass
files.finalize(write_path)
这篇关于Google App Engine:如何将大文件写入 Google Cloud Storage的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Google App Engine:如何将大文件写入 Google Cloud Storage
基础教程推荐
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
- 尝试制作WhatsApp机器人 2022-01-01
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
