How to Transform RGB Videos into Mesmerizing 3D Videos | A Guide by Berkan Zorlubas | August 2023

Introduction:

The training process for the neural network is simple. By running the specified command in the terminal, you can train the network with your custom dataset. After 10 epochs of training, I observed that the loss began to saturate, so I decided to stop training. Checkpoints and test visualizations are stored throughout the training process for analysis. By using the test.py script, you can generate depth maps for each frame. Additionally, you can create colored point clouds using the depth and RGB information with the open3d library. I have provided a script for rendering point clouds which should be self-explanatory. Feel free to ask any questions regarding this process.

Full Article: How to Transform RGB Videos into Mesmerizing 3D Videos | A Guide by Berkan Zorlubas | August 2023

Training the Neural Network

Training a neural network with custom data requires running a specific command in the terminal. By executing the following command in Python, you can train the network:

python train.py --net scene_flow_motion_field --dataset custom_sequence --track_id custom --log_time --epoch_batches 2000 --epoch 10 --lr 1e-6 --html_logger --vali_batches 150 --batch_size 1 --optim adam --vis_batches_vali 1 --vis_every_vali 1 --vis_every_train 1 --vis_batches_train 1 --vis_at_start --gpu 0 --save_net 1 --workers 1 --one_way --loss_type l1 --l1_mul 0 --acc_mul 1 --disp_mul 1 --warm_sf 5 --scene_lr_mul 1000 --repeat 1 --flow_mul 1 --sf_mag_div 100 --time_dependent --gaps 1,2,4,6,8 --midas --use_disp --logdir 'logdir/' --suffix 'track_{track_id}' --force_overwrite

Observations during Training

After training the neural network for 10 epochs, it was noticed that the loss curve started to saturate. As a result, the decision was made to stop training at this point without continuing for additional epochs. The loss curve graph of the training process is provided below:

Checkpoint and Test Visualizations

All checkpoints generated during training are stored in the directory ./logdir/nets/. Additionally, after each epoch, the training script produces test visualizations in the directory ./logdir/visualize. These visualizations serve the purpose of identifying any potential issues that may have occurred during training and help in monitoring the loss.

Generating Depth Maps

Using the latest checkpoint, the depth map of each frame can be generated using the test.py script. To do this, execute the following command in the terminal:

python test.py --net scene_flow_motion_field --dataset custom_sequence --workers 1 --output_dir .test_resultscustom_sequence --epoch 10 --html_logger --batch_size 1 --gpu 0 --track_id custom --suffix custom --checkpoint_path .logdir

Running the above command will result in the generation of one .npz file for each frame. These .npz files are dictionary files that contain information such as RGB frame data, depth, camera pose, flow to the next image, etc. Moreover, for each frame, three depth renders are generated: ground truth, MiDaS, and the estimation made by the trained network.

Creating Colored Point Clouds

In this final step, the batched .npz files are loaded frame-by-frame, and colored point clouds are created using the depth and RGB information. The open3d library is utilized for creating and rendering point clouds in Python. This powerful tool allows the creation of virtual cameras in 3D space and the capture of point cloud visuals. Functions for outlier removal provided by open3d are employed to eliminate flickering and noisy points.

To keep this explanation concise, the specific details of the open3d usage are not discussed. However, a script named render_pointcloud_video.py is provided, which should be self-explanatory. For any questions or clarifications, please feel free to ask.

Below, you can find examples of the videos showcasing the point cloud and depth map for the processed video:

Left: Stock footage provided by Videvo, downloaded from www.videvo.net

Right: Depthmap video created by the author

Bottom: Colorized point cloud video created by the author

A higher-resolution version of this animation can be found on YouTube.

Summary: How to Transform RGB Videos into Mesmerizing 3D Videos | A Guide by Berkan Zorlubas | August 2023

Summary:

This article provides a step-by-step guide on training a neural network using a custom dataset for depth estimation. It includes commands to run in the terminal, information about checkpoints and visualizations during training, and instructions for generating depth maps and point clouds using the trained network. The author also provides a script for rendering point cloud videos and shares examples of the videos created.




Creating 3D Videos from RGB Videos – Frequently Asked Questions

Frequently Asked Questions

What is the process of creating 3D videos from RGB videos?

To create 3D videos from RGB videos, you need to follow these steps:

  1. Convert the RGB videos into stereo pairs.
  2. Extract depth information from the stereo pairs.
  3. Use the depth information to generate a corresponding depth map.
  4. Combine the RGB videos with the depth map to create a 3D video.

What tools or software can be used for creating 3D videos?

There are several tools and software available for creating 3D videos, such as:

  • Adobe After Effects
  • Blender
  • Cinema 4D
  • Autodesk Maya

Is it necessary to have prior knowledge of 3D modeling or animation?

No, it is not necessary to have prior knowledge of 3D modeling or animation to create 3D videos from RGB videos. However, having some basic understanding of these concepts can be helpful.

Are there any online tutorials or courses available for learning the process?

Yes, there are numerous online tutorials and courses available that can help you learn the process of creating 3D videos from RGB videos. Websites like Udemy, Coursera, and YouTube offer various resources to learn this skill.

Can any type of RGB video be converted into a 3D video?

Most RGB videos can be converted into 3D videos, but the quality and effectiveness of the conversion may vary depending on the source material. Videos with sufficient depth cues and suitable stereo pairs are more likely to produce better results.

What are some challenges involved in creating 3D videos from RGB videos?

Creating 3D videos from RGB videos can present a few challenges, such as:

  • Ensuring accurate depth extraction from stereo pairs
  • Managing occlusions and disparities
  • Ensuring smooth video playback and rendering
  • Optimizing the final video for different devices and platforms

Can 3D videos be viewed without special equipment?

Yes, depending on the format and platform, some 3D videos can be viewed without special equipment. Anaglyph 3D videos, for example, can be watched using red and cyan glasses. However, for a more immersive experience, dedicated 3D glasses or VR headsets are recommended.

Are there any limitations or drawbacks of creating 3D videos from RGB videos?

While creating 3D videos from RGB videos can be fascinating, there are a few limitations and drawbacks to consider, such as:

  • Limited depth perception compared to native 3D content
  • Potential loss of video quality during the conversion process
  • Complexity and time required for accurate depth extraction
  • Hardware and software limitations for real-time rendering and playback