반응형
Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
Tags
- CLASS
- linux
- ubuntu
- 티스토리챌린지
- opencv
- Torch
- python
- 맛집
- string
- CUDA
- TTS
- openAI
- ChatGPT
- 딥러닝
- 터미널
- 스팸
- 오블완
- humble
- timm
- socketio
- tensorflow
- pytorch
- ROS2
- ros
- 판교
- error
- Android
- no space left on device
- GPT
- 분당맛집
Archives
- Today
- Total
RoBoLoG
[Error] tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.59GiB (rounded to 3853516800)requested by op model_3/block_1_expand_BN/FusedBatchNormV3 본문
Error Solution/Tensorflow
[Error] tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.59GiB (rounded to 3853516800)requested by op model_3/block_1_expand_BN/FusedBatchNormV3
SKJun 2024. 2. 27. 14:40
아래와 같이 문제 발생
2024-02-27 14:36:00.789061: W tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.59GiB (rounded to 3853516800)requested by op model_3/block_1_expand_BN/FusedBatchNormV3
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
ResourceExhaustedError: Graph execution error:
Detected at node 'model_3/block_1_expand_BN/FusedBatchNormV3' defined at (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/traitlets/config/application.py", line 1041, in launch_instance
app.start()
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 724, in start
self.io_loop.start()
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "/usr/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 512, in dispatch_queue
await self.process_one()
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 501, in process_one
await dispatch(*args)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 408, in dispatch_shell
await result
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 731, in execute_request
reply_content = await reply_content
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/ipykernel/ipkernel.py", line 417, in do_execute
res = shell.run_cell(
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/ipykernel/zmqshell.py", line 540, in run_cell
return super().run_cell(*args, **kwargs)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2945, in run_cell
result = self._run_cell(
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3000, in _run_cell
return runner(coro)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3203, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3382, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3442, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_574743/1994514761.py", line 34, in <module>
history, trained_model = build_and_train_model(base_model, model_name, class_weights, EPOCHS)
File "/tmp/ipykernel_574743/1020796293.py", line 34, in build_and_train_model
history = model.fit(
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/engine/training.py", line 1564, in fit
tmp_logs = self.train_function(iterator)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/engine/training.py", line 1160, in train_function
return step_function(self, iterator)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/engine/training.py", line 1146, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/engine/training.py", line 1135, in run_step
outputs = model.train_step(data)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/engine/training.py", line 993, in train_step
y_pred = self(x, training=True)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/engine/training.py", line 557, in __call__
return super().__call__(*args, **kwargs)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1097, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/engine/functional.py", line 510, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/engine/functional.py", line 667, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1097, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/layers/normalization/batch_normalization.py", line 850, in call
outputs = self._fused_batch_norm(inputs, training=training)
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/layers/normalization/batch_normalization.py", line 660, in _fused_batch_norm
output, mean, variance = control_flow_util.smart_cond(
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/utils/control_flow_util.py", line 108, in smart_cond
return tf.__internal__.smart_cond.smart_cond(
File "/usr/local/rc_sw/common/vision_ws/lib/python3.8/site-packages/keras/layers/normalization/batch_normalization.py", line 649, in _fused_batch_norm_inference
return tf.compat.v1.nn.fused_batch_norm(
Node: 'model_3/block_1_expand_BN/FusedBatchNormV3'
OOM when allocating tensor with shape[800,112,112,96] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model_3/block_1_expand_BN/FusedBatchNormV3}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_train_function_66239]
해결방법 = dataset generator에서 batch_size를 줄인다.
728x90
반응형
'Error Solution > Tensorflow' 카테고리의 다른 글
[Error] tensorflow_cpu_aws cannot allocate memory in static TLS block (mediapipe, ros, tensorflow 사용시) (0) | 2024.03.13 |
---|