fbpx
Wikipedia

Video super-resolution

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

VSR and SISR methods' outputs comparison. VSR restores more details by using temporal information.

There are many approaches for this task, but this problem still remains to be popular and challenging.


Mathematical explanation Edit

Most research considers the degradation process of frames as

 

where:

  — original high-resolution frame sequence,
  — blur kernel,
  — convolution operation,
  — downscaling operation,
  — additive noise,
  — low-resolution frame sequence.

Super-resolution is an inverse operation, so its problem is to estimate frame sequence   from frame sequence   so that   is close to original  . Blur kernel, downscaling operation and additive noise should be estimated for given input to achieve better results.

Video super-resolution approaches tend to have more components than the image counterparts as they need to exploit the additional temporal dimension. Complex designs are not uncommon. Some most essential components for VSR are guided by four basic functionalities: Propagation, Alignment, Aggregation, and Upsampling.[1]

  • Propagation refers to the way in which features are propagated temporally
  • Alignment concerns on the spatial transformation applied to misaligned images/features
  • Aggregation defines the steps to combine aligned features
  • Upsampling describes the method to transform the aggregated features to the final output image

Methods Edit

When working with video, temporal information could be used to improve upscaling quality. Single image super-resolution methods could be used too, generating high-resolution frames independently from their neighbours, but it's less effective and introduces temporal instability. There are a few traditional methods, which consider the video super-resolution task as an optimization problem. Last years deep learning based methods for video upscaling outperform traditional ones.

Traditional methods Edit

There are several traditional methods for video upscaling. These methods try to use some natural preferences and effectively estimate motion between frames. The high-resolution frame is reconstructed based on both natural preferences and estimated motion.

Frequency domain Edit

Firstly the low-resolution frame is transformed to the frequency domain. The high-resolution frame is estimated in this domain. Finally, this result frame is transformed to the spatial domain. Some methods use Fourier transform, which helps to extend the spectrum of captured signal and though increase resolution. There are different approaches for these methods: using weighted least squares theory,[2] total least squares (TLS) algorithm,[3] space-varying[4] or spatio-temporal[5] varying filtering. Other methods use wavelet transform, which helps to find similarities in neighboring local areas.[6] Later second-generation wavelet transform was used for video super resolution.[7]

Spatial domain Edit

Iterative back-projection methods assume some function between low-resolution and high-resolution frames and try to improve their guessed function in each step of an iterative process.[8] Projections onto convex sets (POCS), that defines a specific cost function, also can be used for iterative methods.[9]

Iterative adaptive filtering algorithms use Kalman filter to estimate transformation from low-resolution frame to high-resolution one.[10] To improve the final result these methods consider temporal correlation among low-resolution sequences. Some approaches also consider temporal correlation among high-resolution sequence.[11] To approximate Kalman filter a common way is to use least mean squares (LMS).[12] One can also use steepest descent,[13] least squares (LS),[14] recursive least squares (RLS).[14]

Direct methods estimate motion between frames, upscale a reference frame, and warp neighboring frames to the high-resolution reference one. To construct result, these upscaled frames are fused together by median filter,[15] weighted median filter,[16] adaptive normalized averaging, AdaBoost classifier[17] or SVD based filters.[18]

Non-parametric algorithms join motion estimation and frames fusion to one step. It is performed by consideration of patches similarities. Weights for fusion can be calculated by nonlocal-means filters.[19] To strength searching for similar patches, one can use rotation invariance similarity measure[20] or adaptive patch size.[21] Calculating intra-frame similarity help to preserve small details and edges.[22] Parameters for fusion also can be calculated by kernel regression.[23]

Probabilistic methods use statistical theory to solve the task. maximum likelihood (ML) methods estimate more probable image.[24][25] Another group of methods use maximum a posteriori (MAP) estimation. Regularization parameter for MAP can be estimated by Tikhonov regularization.[26] Markov random fields (MRF) is often used along with MAP and helps to preserve similarity in neighboring patches.[27] Huber MRFs are used to preserve sharp edges.[28] Gaussian MRF can smooth some edges, but remove noise.[29]

Deep learning based methods Edit

Aligned by motion estimation and motion compensation Edit

In approaches with alignment, neighboring frames are firstly aligned with target one. One can align frames by performing motion estimation and motion compensation (MEMC) or by using Deformable convolution (DC). Motion estimation gives information about the motion of pixels between frames. motion compensation is a warping operation, which aligns one frame to another based on motion information. Examples of such methods:

  • Deep-DE[30] (deep draft-ensemble learning) generates a series of SR feature maps and then process them together to estimate the final frame
  • VSRnet[31] is based on SRCNN (model for single image super resolution), but takes multiple frames as input. Input frames are first aligned by the Druleas algorithm
  • VESPCN[32] uses a spatial motion compensation transformer module (MCT), which estimates and compensates motion. Then a series of convolutions performed to extract feature and fuse them
  • DRVSR[33] (detail-revealing deep video super-resolution) consists of three main steps: motion estimation, motion compensation and fusion. The motion compensation transformer (MCT) is used for motion estimation. The sub-pixel motion compensation layer (SPMC) compensates motion. Fusion step uses encoder-decoder architecture and ConvLSTM module to unit information from both spatial and temporal dimensions
  • RVSR[34] (robust video super-resolution) have two branches: one for spatial alignment and another for temporal adaptation. The final frame is a weighted sum of branches' output
  • FRVSR[35] (frame recurrent video super-resolution) estimate low-resolution optical flow, upsample it to high-resolution and warp previous output frame by using this high-resolution optical flow
  • STTN[36] (the spatio-temporal transformer network) estimate optical flow by U-style network based on Unet and compensate motion by a trilinear interpolation method
  • SOF-VSR[37] (super-resolution optical flow for video super-resolution) calculate high-resolution optical flow in coarse-to-fine manner. Then the low-resolution optical flow is estimated by a space-to-depth transformation. The final super-resolution result is gained from aligned low-resolution frames
  • TecoGAN[38] (the temporally coherent GAN) consists of generator and discriminator. Generator estimates LR optical flow between consecutive frames and from this approximate HR optical flow to yield output frame. The discriminator assesses the quality of the generator
  • TOFlow[39] (task-oriented flow) is a combination of optical flow network and reconstruction network. Estimated optical flow is suitable for a particular task, such as video super resolution
  • MMCNN[40] (the multi-memory convolutional neural network) aligns frames with target one and then generates the final HR-result through the feature extraction, detail fusion and feature reconstruction modules
  • RBPN[41] (the recurrent back-projection network). The input of each recurrent projection module features from the previous frame, features from the consequence of frames, and optical flow between neighboring frames
  • MEMC-Net[42] (the motion estimation and motion compensation network) uses both motion estimation network and kernel estimation network to warp frames adaptively
  • RTVSR[43] (real-time video super-resolution) aligns frames with estimated convolutional kernel
  • MultiBoot VSR[44] (the multi-stage multi-reference bootstrapping method) aligns frames and then have two-stage of SR-reconstruction to improve quality
  • BasicVSR[45] aligns frames with optical flow and then fuse their features in a recurrent bidirectional scheme
  • IconVSR[45] is a refined version of BasicVSR with a recurrent coupled propagation scheme
  • UVSR[46] (unrolled network for video super-resolution) adapted unrolled optimization algorithms to solve the VSR problem

Aligned by deformable convolution Edit

Another way to align neighboring frames with target one is deformable convolution. While usual convolution has fixed kernel, deformable convolution on the first step estimate shifts for kernel and then do convolution. Examples of such methods:

  • EDVR[47] (The enhanced deformable video restoration) can be divided into two main modules: the pyramid, cascading and deformable (PCD) module for alignment and the temporal-spatial attention (TSA) module for fusion
  • DNLN[48] (The deformable non-local network) has alignment module, based on deformable convolution with the hierarchical feature fusion module (HFFB) for better quality) and non-local attention module
  • TDAN[49] (The temporally deformable alignment network) consists of an alignment module and a reconstruction module. Alignment performed by deformable convolution based on feature extraction and alignment
  • Multi-Stage Feature Fusion Network[50] for Video Super-Resolution uses the multi-scale dilated deformable convolution for frame alignment and the Modulative Feature Fusion Branch to integrate aligned frames

Aligned by homography Edit

Some methods align frames by calculated homography between frames.

  • TGA[51] (Temporal Group Attention) divide input frames to N groups dependent on time difference and extract information from each group independently. Fast Spatial Alignment module based on homography used to align frames

Spatial non-aligned Edit

Methods without alignment do not perform alignment as a first step and just process input frames.

  • VSRResNet[52] like GAN consists of generator and discriminator. Generator upsamples input frames, extracts features and fuses them. Discriminator assess the quality of result high-resolution frames
  • FFCVSR[53] (frame and feature-context video super-resolution) takes unaligned low-resolution frames and output high-resolution previous frames to simultaneously restore high-frequency details and maintain temporal consistency
  • MRMNet[54] (the multi-resolution mixture network) consists of three modules: bottleneck, exchange, and residual. Bottleneck unit extract features that have the same resolution as input frames. Exchange module exchange features between neighboring frames and enlarges feature maps. Residual module extract features after exchange one
  • STMN[55] (the spatio-temporal matching network) use discrete wavelet transform to fuse temporal features. Non-local matching block integrates super-resolution and denoising. At the final step, SR-result is got on the global wavelet domain
  • MuCAN[56] (the multi-correspondence aggregation network) uses temporal multi-correspondence strategy to fuse temporal features and cross-scale nonlocal-correspondence to extract self-similarities in frames

3D convolutions Edit

While 2D convolutions work on spatial domain, 3D convolutions use both spatial and temporal information. They perform motion compensation and maintain temporal consistency

  • DUF[57] (the dynamic upsampling filters) uses deformable 3D convolution for motion compensation. The model estimates kernels for specific input frames
  • FSTRN[58] (The fast spatio-temporal residual network) includes a few modules: LR video shallow feature extraction net (LFENet), LR feature fusion and up-sampling module (LSRNet) and two residual modules: spatio-temporal and global
  • 3DSRnet[59] (The 3D super-resolution network) uses 3D convolutions to extract spatio-temporal information. Model also has a special approach for frames, where scene change is detected
  • MP3D[60] (the multi-scale pyramid 3D convolutional network) uses 3D convolution to extract spatial and temporal features simultaneously, which then passed through reconstruction module with 3D sub-pixel convolution for upsampling
  • DMBN[61] (the dynamic multiple branch network) has three branches to exploit information from multiple resolutions. Finally, information from branches fuse dynamically

Recurrent neural networks Edit

Recurrent convolutional neural networks perform video super-resolution by storing temporal dependencies.

  • STCN[62] (the spatio-temporal convolutional network) extract features in the spatial module, pass them through the recurrent temporal module and final reconstruction module. Temporal consistency is maintained by long short-term memory (LSTM) mechanism
  • BRCN[63] (the bidirectional recurrent convolutional network) has two subnetworks: with forward fusion and backward fusion. The result of the network is a composition of two branches' output
  • RISTN[64] (the residual invertible spatio-temporal network) consists of spatial, temporal and reconstruction module. Spatial module composed of residual invertible blocks (RIB), which extract spatial features effectively. The output of the spatial module is processed by the temporal module, which extracts spatio-temporal information and then fuses important features. The final result is calculated in the reconstruction module by deconvolution operation
  • RRCN[65] (the residual recurrent convolutional network) is a bidirectional recurrent network, which calculates a residual image. Then the final result is gained by adding a bicubically upsampled input frame
  • RRN[66] (the recurrent residual network) uses a recurrent sequence of residual blocks to extract spatial and temporal information
  • BTRPN[67] (the bidirectional temporal-recurrent propagation network) use bidirectional recurrent scheme. Final-result combined from two branches with channel attention mechanism
  • RLSP[68] (recurrent latent state propagation) fully convolutional network cell with highly efficient propagation of temporal information through a hidden state
  • RSDN[69] (the recurrent structure-detail network) divide input frame into structure and detail components and process them in two parallel streams

Videos Edit

Non-local methods extract both spatial and temporal information. The key idea is to use all possible positions as a weighted sum. This strategy may be more effective than local approaches (the progressive fusion non-local method) extract spatio-temporal features by non-local residual blocks, then fuse them by progressive fusion residual block (PFRB). The result of these blocks is a residual image. The final result is gained by adding bicubically upsampled input frame

  • NLVSR[70] (the novel video super‐resolution network) aligns frames with target one by temporal‐spatial non‐local operation. To integrate information from aligned frames an attention‐based mechanism is used
  • MSHPFNL[71] also incorporates multi-scale structure and hybrid convolutions to extract wide-range dependencies. To avoid some artifacts like flickering or ghosting, they use generative adversarial training

Metrics Edit

 
Top: Original sequence, bottom: PSNR (Peak signal-to-noise ratio) visualization of the output of a VSR method

The common way to estimate the performance of video super-resolution algorithms is to use a few metrics:

Currently, there aren't so many objective metrics to verify video super-resolution method's ability to restore real details. Research is currently underway in this area.

Another way to assess the performance of the video super-resolution algorithm is to organize the subjective evaluation. People are asked to compare the corresponding frames, and the final mean opinion score (MOS) is calculated as the arithmetic mean overall ratings.

Datasets Edit

While deep learning approaches of video super-resolution outperform traditional ones, it's crucial to form a high-quality dataset for evaluation. It's important to verify models' ability to restore small details, text, and objects with complicated structure, to cope with big motion and noise.

Comparison of datasets
Dataset Videos Mean video length Ground-truth resolution Motion in frames Fine details
Vid4 4 43 frames 720×480 Without fast motion Some small details, without text
SPMCS 30 31 frames 960×540 SLow motion A lot of small details
Vimeo-90K (test SR set) 7824 7 frames 448×256 A lot of fast, difficult, diverse motion Few details, text in a few sequences
Xiph HD (complete sets) 70 2 seconds from 640×360
to 4096×2160
A lot of fast, difficult, diverse motion Few details, text in a few sequences
Ultra Video Dataset 4K 16 10 seconds 4096×2160 Diverse motion Few details, without text
REDS (test SR) 30 100 frames 1280×720 A lot of fast, difficult, diverse motion Few details, without text
Space-Time SR 5 100 frames 1280×720 Diverse motion Without small details and text
Harmonic 4096×2160
CDVL 1920×1080

Benchmarks Edit

A few benchmarks in video super-resolution were organized by companies and conferences. The purposes of such challenges are to compare diverse algorithms and to find the state-of-the-art for the task.

Comparison of benchmarks
Benchmark Organizer Dataset Upscale factor Metrics
NTIRE 2019 Challenge CVPR (Computer Vision and pattern recognition) REDS 4 PSNR, SSIM
Youku-VESR Challenge 2019 Youku Youku-VESR 4 PSNR, VMAF
AIM 2019 Challenge ECCV (European Conference on Computer Vision) Vid3oC 16 PSNR, SSIM, MOS
AIM 2020 Challenge ECCV (European Conference on Computer Vision) Vid3oC 16 PSNR, SSIM, LPIPS
Mobile Video Restoration Challenge ICIP (International Conference of Image Processing), Kwai PSNR, SSIM, MOS
MSU Video Super-Resolution Benchmark 2021 MSU (Moscow State University) 4 ERQAv1.0, PSNR and SSIM with shift compensation, QRCRv1.0, CRRMv1.0
MSU Super-Resolution for Video Compression Benchmark 2022 MSU (Moscow State University) 4 ERQAv2.0, PSNR, MS-SSIM, VMAF, LPIPS

NTIRE 2019 Challenge Edit

The NTIRE 2019 Challenge was organized by CVPR and proposed two tracks for Video Super-Resolution: clean (only bicubic degradation) and blur (blur added firstly). Each track had more than 100 participants and 14 final results were submitted.
Dataset REDS was collected for this challenge. It consists of 30 videos of 100 frames each. The resolution of ground-truth frames is 1280×720. The tested scale factor is 4. To evaluate models' performance PSNR and SSIM were used. The best participants' results are performed in the table:

Top teams
Team Model name PSNR
(clean track)
SSIM
(clean track)
PSNR
(blur track)
SSIM
(blur track)
Runtime per image in sec
(clean track)
Runtime per image in sec
(blur track)
Platform GPU Open source
HelloVSR EDVR 31.79 0.8962 30.17 0.8647 2.788 3.562 PyTorch TITAN Xp YES
UIUC-IFP WDVR 30.81 0.8748 29.46 0.8430 0.980 0.980 PyTorch Tesla V100 YES
SuperRior ensemble of RDN,
RCAN, DUF
31.13 0.8811 120.000 PyTorch Tesla V100 NO
CyberverseSanDiego RecNet 31.00 0.8822 27.71 0.8067 3.000 3.000 TensorFlow RTX 2080 Ti YES
TTI RBPN 30.97 0.8804 28.92 0.8333 1.390 1.390 PyTorch TITAN X YES
NERCMS PFNL 30.91 0.8782 28.98 0.8307 6.020 6.020 PyTorch GTX 1080 Ti YES
XJTU-IAIR FSTDN 28.86 0.8301 13.000 PyTorch GTX 1080 Ti NO

Youku-VESR Challenge 2019 Edit

The Youku-VESR Challenge was organized to check models' ability to cope with degradation and noise, which are real for Youku online video-watching application. The proposed dataset consists of 1000 videos, each length is 4–6 seconds. The resolution of ground-truth frames is 1920×1080. The tested scale factor is 4. PSNR and VMAF metrics were used for performance evaluation. Top methods are performed in the table:

Top teams
Team PSNR VMAF
Avengers Assemble 37.851 41.617
NJU_L1 37.681 41.227
ALONG_NTES 37.632 40.405

AIM 2019 Challenge Edit

The challenge was held by ECCV and had two tracks on video extreme super-resolution: first track checks the fidelity with reference frame (measured by PSNR and SSIM). The second track checks the perceptual quality of videos (MOS). Dataset consists of 328 video sequences of 120 frames each. The resolution of ground-truth frames is 1920×1080. The tested scale factor is 16. Top methods are performed in the table:

Top teams
Team Model name PSNR SSIM MOS Runtime per image in sec Platform GPU/CPU Open source
fenglinglwb based on EDVR 22.53 0.64 first result 0.35 PyTorch 4× Titan X NO
NERCMS PFNL 22.35 0.63 0.51 PyTorch 2× 1080 Ti NO
baseline RLSP 21.75 0.60 0.09 TensorFlow Titan Xp NO
HIT-XLab based on EDSR 21.45 0.60 second result 60.00 PyTorch V100 NO

AIM 2020 Challenge Edit

Challenge's conditions are the same as AIM 2019 Challenge. Top methods are performed in the table:

Top teams
Team Model name Params number PSNR SSIM Runtime per image in sec GPU/CPU Open source
KirinUK EVESRNet 45.29M 22.83 0.6450 6.1 s 1 × 2080Ti 6 NO
Team-WVU 29.51M 22.48 0.6378 4.9 s 1 × TitanXp NO
BOE-IOT-AIBD 3D-MGBP 53M 22.48 0.6304 4.83 s 1 × 1080 NO
sr xxx based on EDVR 22.43 0.6353 4 s 1 × V100 NO
ZZX MAHA 31.14M 22.28 0.6321 4 s 1 × 1080Ti NO
lyl FineNet 22.08 0.6256 13 s NO
TTI based on STARnet 21.91 0.6165 0.249 s NO
CET CVLab 21.77 0.6112 0.04 s 1 × P100 NO

MSU Video Super-Resolution Benchmark Edit

The MSU Video Super-Resolution Benchmark was organized by MSU and proposed three types of motion, two ways to lower resolution, and eight types of content in the dataset. The resolution of ground-truth frames is 1920×1280. The tested scale factor is 4. 14 models were tested. To evaluate models' performance PSNR and SSIM were used with shift compensation. Also proposed a few new metrics: ERQAv1.0, QRCRv1.0, and CRRMv1.0.[72] Top methods are performed in the table:

Top methods
Model name Multi-frame Subjective ERQAv1.0 PSNR SSIM QRCRv1.0 CRRMv1.0 Runtime per image in sec Open source
DBVSR YES 5.561 0.737 31.071 0.894 0.629 0.992 YES
LGFN YES 5.040 0.740 31.291 0.898 0.629 0.996 1.499 YES
DynaVSR-R YES 4.751 0.709 28.377 0.865 0.557 0.997 5.664 YES
TDAN YES 4.036 0.706 30.244 0.883 0.557 0.994 YES
DUF-28L YES 3.910 0.645 25.852 0.830 0.549 0.993 2.392 YES
RRN-10L YES 3.887 0.627 24.252 0.790 0.557 0.989 0.390 YES
RealSR NO 3.749 0.690 25.989 0.767 0.000 0.886 YES

MSU Super-Resolution for Video Compression Benchmark Edit

The MSU Super-Resolution for Video Compression Benchmark was organized by MSU. This benchmark tests models' ability to work with compressed videos. The dataset consists of 9 videos, compressed with different Video codec standards and different bitrates. Models are ranked by BSQ-rate[73] over subjective score. The resolution of ground-truth frames is 1920×1080. The tested scale factor is 4. 17 models were tested. 5 video codecs were used to compress ground-truth videos. Top combinations of Super-Resolution methods and video codecs are performed in the table:

Top methods
Model name BSQ-rate (Subjective score) BSQ-rate (ERQAv2.0) BSQ-rate (VMAF) BSQ-rate (PSNR) BSQ-rate (MS-SSIM) BSQ-rate (LPIPS) Open source
RealSR + x264 0.196 0.770 0.775 0.675 0.487 0.591 YES
ahq-11 + x264 0.271 0.883 0.753 0.873 0.719 0.656 NO
SwinIR + x264 0.304 0.760 0.642 6.268 0.736 0.559 YES
Real-ESRGAN + x264 0.335 5.580 0.698 7.874 0.881 0.733 YES
SwinIR + x265 0.346 1.575 1.304 8.130 4.641 1.474 YES
COMISR + x264 0.367 0.969 1.302 6.081 0.672 1.118 YES
RealSR + x265 0.502 1.622 1.617 1.064 1.033 1.206 YES

Application Edit

In many areas, working with video, we deal with different types of video degradation, including downscaling. The resolution of video can be degraded because of imperfections of measuring devices, such as optical degradations and limited size of camera sensors. Bad light and weather conditions add noise to video. Object and camera motion also decrease video quality. Super Resolution techniques help to restore the original video. It's useful in a wide range of applications, such as

  • video surveillance (to improve video captured from the camera and recognize car numbers and faces)
  • medical imaging (to discover better some organs or tissues for clinical analysis and medical intervention)
  • forensic science (to help in the investigation during the criminal procedure)
  • astronomy (to improve quality of video of stars and planets)
  • remote sensing (to alleviate observation of an object)
  • microscopy (to strength microscopes' ability)

It also helps to solve task of object detection, face and character recognition (as preprocessing step). The interest to super-resolution is growing with the development of high definition computer displays and TVs.

 
Simulating the natural hand movements by "jiggling" the camera

Video super-resolution finds its practical use in some modern smartphones and cameras, where it is used to reconstruct digital photographs.

Reconstructing details on digital photographs is a difficult task since these photographs are already incomplete: the camera sensor elements measure only the intensity of the light, not directly its color. A process called demosaicing is used to reconstruct the photos from partial color information. A single frame doesn't give us enough data to fill in the missing colors, however, we can receive some of the missing information from multiple images taken one after the other. This process is known as burst photography and can be used to restore a single image of good quality from multiple sequential frames.

When we capture a lot of sequential photos with a smartphone or handheld camera, there is always some movement present between the frames because of the hand motion. We can take advantage of this hand tremor by combining the information on those images. We choose a single image as the "base" or reference frame and align every other frame relative to it.

There are situations where hand motion is simply not present because the device is stabilized (e.g. placed on a tripod). There is a way to simulate natural hand motion by intentionally slightly moving the camera. The movements are extremely small so they don't interfere with regular photos. You can observe these motions on Google Pixel 3[74] phone by holding it perfectly still (e.g. pressing it against the window) and maximally pinch-zooming the viewfinder.

See also Edit

References Edit

  1. ^ Chan, Kelvin CK, et al. "BasicVSR: The search for essential components in video super-resolution and beyond." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
  2. ^ Kim, S. P.; Bose, N. K.; Valenzuela, H. M. (1989). "Reconstruction of high resolution image from noise undersampled frames". Lecture Notes in Control and Information Sciences. Vol. 129. Berlin/Heidelberg: Springer-Verlag. pp. 315–326. doi:10.1007/bfb0042742. ISBN 3-540-51424-4.
  3. ^ Bose, N.K.; Kim, H.C.; Zhou, B. (1994). "Performance analysis of the TLS algorithm for image reconstruction from a sequence of undersampled noisy and blurred frames". Proceedings of 1st International Conference on Image Processing. Vol. 3. IEEE Comput. Soc. Press. pp. 571–574. doi:10.1109/icip.1994.413741. ISBN 0-8186-6952-7.
  4. ^ Tekalp, A.M.; Ozkan, M.K.; Sezan, M.I. (1992). "High-resolution image reconstruction from lower-resolution image sequences and space-varying image restoration". [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE. pp. 169–172 vol.3. doi:10.1109/icassp.1992.226249. ISBN 0-7803-0532-9.
  5. ^ Goldberg, N.; Feuer, A.; Goodwin, G.C. (2003). "Super-resolution reconstruction using spatio-temporal filtering". Journal of Visual Communication and Image Representation. Elsevier BV. 14 (4): 508–525. doi:10.1016/s1047-3203(03)00042-7. ISSN 1047-3203.
  6. ^ Mallat, S (2010). "Super-Resolution With Sparse Mixing Estimators". IEEE Transactions on Image Processing. Institute of Electrical and Electronics Engineers (IEEE). 19 (11): 2889–2900. Bibcode:2010ITIP...19.2889M. doi:10.1109/tip.2010.2049927. ISSN 1057-7149. PMID 20457549. S2CID 856101.
  7. ^ Bose, N.K.; Lertrattanapanich, S.; Chappalli, M.B. (2004). "Superresolution with second generation wavelets". Signal Processing: Image Communication. Elsevier BV. 19 (5): 387–391. doi:10.1016/j.image.2004.02.001. ISSN 0923-5965.
  8. ^ Cohen, B.; Avrin, V.; Dinstein, I. (2000). "Polyphase back-projection filtering for resolution enhancement of image sequences". 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100). Vol. 4. IEEE. pp. 2171–2174. doi:10.1109/icassp.2000.859267. ISBN 0-7803-6293-4.
  9. ^ Katsaggelos, A.K. (1997). "An iterative weighted regularized algorithm for improving the resolution of video sequences". Proceedings of International Conference on Image Processing. IEEE Comput. Soc. pp. 474–477. doi:10.1109/icip.1997.638811. ISBN 0-8186-8183-7.
  10. ^ Farsiu, Sina; Elad, Michael; Milanfar, Peyman (2006-01-15). "A practical approach to superresolution". In Apostolopoulos, John G.; Said, Amir (eds.). Visual Communications and Image Processing 2006. Vol. 6077. SPIE. p. 607703. doi:10.1117/12.644391.
  11. ^ Jing Tian; Kai-Kuang Ma (2005). "A new state-space approach for super-resolution image sequence reconstruction". IEEE International Conference on Image Processing 2005. IEEE. pp. I-881. doi:10.1109/icip.2005.1529892. ISBN 0-7803-9134-9.
  12. ^ Costa, Guilherme Holsbach; Bermudez, Jos Carlos Moreira (2007). "Statistical Analysis of the LMS Algorithm Applied to Super-Resolution Image Reconstruction". IEEE Transactions on Signal Processing. Institute of Electrical and Electronics Engineers (IEEE). 55 (5): 2084–2095. Bibcode:2007ITSP...55.2084C. doi:10.1109/tsp.2007.892704. ISSN 1053-587X. S2CID 52857681.
  13. ^ Elad, M.; Feuer, A. (1999). "Super-resolution reconstruction of continuous image sequences". Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348). Vol. 3. IEEE. pp. 459–463. doi:10.1109/icip.1999.817156. ISBN 0-7803-5467-2.
  14. ^ a b Elad, M.; Feuer, A. (1999). "Superresolution restoration of an image sequence: adaptive filtering approach". IEEE Transactions on Image Processing. Institute of Electrical and Electronics Engineers (IEEE). 8 (3): 387–395. Bibcode:1999ITIP....8..387E. doi:10.1109/83.748893. ISSN 1057-7149. PMID 18262881.
  15. ^ Pickering, M.; Frater, M.; Arnold, J. (2005). "Arobust approach to super-resolution sprite generation". IEEE International Conference on Image Processing 2005. IEEE. pp. I-897. doi:10.1109/icip.2005.1529896. ISBN 0-7803-9134-9.
  16. ^ Nasonov, Andrey V.; Krylov, Andrey S. (2010). "Fast Super-Resolution Using Weighted Median Filtering". 2010 20th International Conference on Pattern Recognition. IEEE. pp. 2230–2233. doi:10.1109/icpr.2010.546. ISBN 978-1-4244-7542-1.
  17. ^ Simonyan, K.; Grishin, S.; Vatolin, D.; Popov, D. (2008). "Fast video super-resolution via classification". 2008 15th IEEE International Conference on Image Processing. IEEE. pp. 349–352. doi:10.1109/icip.2008.4711763. ISBN 978-1-4244-1765-0.
  18. ^ Nasir, Haidawati; Stankovic, Vladimir; Marshall, Stephen (2011). "Singular value decomposition based fusion for super-resolution image reconstruction". 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA). IEEE. pp. 393–398. doi:10.1109/icsipa.2011.6144138. ISBN 978-1-4577-0242-6.
  19. ^ Protter, M.; Elad, M.; Takeda, H.; Milanfar, P. (2009). "Generalizing the Nonlocal-Means to Super-Resolution Reconstruction". IEEE Transactions on Image Processing. Institute of Electrical and Electronics Engineers (IEEE). 18 (1): 36–51. Bibcode:2009ITIP...18...36P. doi:10.1109/tip.2008.2008067. ISSN 1057-7149. PMID 19095517. S2CID 2142115.
  20. ^ Zhuo, Yue; Liu, Jiaying; Ren, Jie; Guo, Zongming (2012). "Nonlocal based Super Resolution with rotation invariance and search window relocation". 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. pp. 853–856. doi:10.1109/icassp.2012.6288018. ISBN 978-1-4673-0046-9.
  21. ^ Cheng, Ming-Hui; Chen, Hsuan-Ying; Leou, Jin-Jang (2011). "Video super-resolution reconstruction using a mobile search strategy and adaptive patch size". Signal Processing. Elsevier BV. 91 (5): 1284–1297. doi:10.1016/j.sigpro.2010.12.016. ISSN 0165-1684. S2CID 17920263.
  22. ^ Huhle, Benjamin; Schairer, Timo; Jenke, Philipp; Straßer, Wolfgang (2010). "Fusion of range and color images for denoising and resolution enhancement with a non-local filter". Computer Vision and Image Understanding. Elsevier BV. 114 (12): 1336–1345. doi:10.1016/j.cviu.2009.11.004. ISSN 1077-3142.
  23. ^ Takeda, Hiroyuki; Farsiu, Sina; Milanfar, Peyman (2007). "Kernel Regression for Image Processing and Reconstruction". IEEE Transactions on Image Processing. Institute of Electrical and Electronics Engineers (IEEE). 16 (2): 349–366. Bibcode:2007ITIP...16..349T. doi:10.1109/tip.2006.888330. ISSN 1057-7149. PMID 17269630. S2CID 12116009.
  24. ^ Elad, M.; Feuer, A. (1997). "Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images". IEEE Transactions on Image Processing. Institute of Electrical and Electronics Engineers (IEEE). 6 (12): 1646–1658. Bibcode:1997ITIP....6.1646E. doi:10.1109/83.650118. ISSN 1057-7149. PMID 18285235.
  25. ^ Farsiu, Sina; Robinson, Dirk; Elad, Michael; Milanfar, Peyman (2003-11-20). "Robust shift and add approach to superresolution". In Tescher, Andrew G. (ed.). Applications of Digital Image Processing XXVI. Vol. 5203. SPIE. p. 121. doi:10.1117/12.507194.
  26. ^ Chantas, G.K.; Galatsanos, N.P.; Woods, N.A. (2007). "Super-Resolution Based on Fast Registration and Maximum a Posteriori Reconstruction". IEEE Transactions on Image Processing. Institute of Electrical and Electronics Engineers (IEEE). 16 (7): 1821–1830. Bibcode:2007ITIP...16.1821C. doi:10.1109/tip.2007.896664. ISSN 1057-7149. PMID 17605380. S2CID 1811280.
  27. ^ Rajan, D.; Chaudhuri, S. (2001). "Generation of super-resolution images from blurred observations using Markov random fields". 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221). Vol. 3. IEEE. pp. 1837–1840. doi:10.1109/icassp.2001.941300. ISBN 0-7803-7041-4.
  28. ^ Zibetti, Marcelo Victor Wust; Mayer, Joceli (2006). "Outlier Robust and Edge-Preserving Simultaneous Super-Resolution". 2006 International Conference on Image Processing. IEEE. pp. 1741–1744. doi:10.1109/icip.2006.312718. ISBN 1-4244-0480-0.
  29. ^ Joshi, M.V.; Chaudhuri, S.; Panuganti, R. (2005). "A Learning-Based Method for Image Super-Resolution From Zoomed Observations". IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. Institute of Electrical and Electronics Engineers (IEEE). 35 (3): 527–537. doi:10.1109/tsmcb.2005.846647. ISSN 1083-4419. PMID 15971920. S2CID 3162908.
  30. ^ Liao, Renjie; Tao, Xin; Li, Ruiyu; Ma, Ziyang; Jia, Jiaya (2015). "Video Super-Resolution via Deep Draft-Ensemble Learning". 2015 IEEE International Conference on Computer Vision (ICCV). IEEE. pp. 531–539. doi:10.1109/iccv.2015.68. ISBN 978-1-4673-8391-2.
  31. ^ Kappeler, Armin; Yoo, Seunghwan; Dai, Qiqin; Katsaggelos, Aggelos K. (2016). "Video Super-Resolution With Convolutional Neural Networks". IEEE Transactions on Computational Imaging. Institute of Electrical and Electronics Engineers (IEEE). 2 (2): 109–122. doi:10.1109/tci.2016.2532323. ISSN 2333-9403. S2CID 9356783.
  32. ^ Caballero, Jose; Ledig, Christian; Aitken, Andrew; Acosta, Alejandro; Totz, Johannes; Wang, Zehan; Shi, Wenzhe (2016-11-16). "Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation". arXiv:1611.05250v2 [cs.CV].
  33. ^ Tao, Xin; Gao, Hongyun; Liao, Renjie; Wang, Jue; Jia, Jiaya (2017). "Detail-Revealing Deep Video Super-Resolution". 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. pp. 4482–4490. arXiv:1704.02738. doi:10.1109/iccv.2017.479. ISBN 978-1-5386-1032-9.
  34. ^ Liu, Ding; Wang, Zhaowen; Fan, Yuchen; Liu, Xianming; Wang, Zhangyang; Chang, Shiyu; Huang, Thomas (2017). "Robust Video Super-Resolution with Learned Temporal Dynamics". 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. pp. 2526–2534. doi:10.1109/iccv.2017.274. ISBN 978-1-5386-1032-9.
  35. ^ Sajjadi, Mehdi S. M.; Vemulapalli, Raviteja; Brown, Matthew (2018). "Frame-Recurrent Video Super-Resolution". 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. pp. 6626–6634. arXiv:1801.04590. doi:10.1109/cvpr.2018.00693. ISBN 978-1-5386-6420-9.
  36. ^ Kim, Tae Hyun; Sajjadi, Mehdi S. M.; Hirsch, Michael; Schölkopf, Bernhard (2018). "Spatio-Temporal Transformer Network for Video Restoration". Computer Vision – ECCV 2018. Cham: Springer International Publishing. pp. 111–127. doi:10.1007/978-3-030-01219-9_7. ISBN 978-3-030-01218-2. ISSN 0302-9743.
  37. ^ Wang, Longguang; Guo, Yulan; Liu, Li; Lin, Zaiping; Deng, Xinpu; An, Wei (2020). "Deep Video Super-Resolution Using HR Optical Flow Estimation". IEEE Transactions on Image Processing. Institute of Electrical and Electronics Engineers (IEEE). 29: 4323–4336. arXiv:2001.02129. Bibcode:2020ITIP...29.4323W. doi:10.1109/tip.2020.2967596. ISSN 1057-7149. PMID 31995491. S2CID 210023539.
  38. ^ Chu, Mengyu; Xie, You; Mayer, Jonas; Leal-Taixé, Laura; Thuerey, Nils (2020-07-08). "Learning temporal coherence via self-supervision for GAN-based video generation". ACM Transactions on Graphics. Association for Computing Machinery (ACM). 39 (4). arXiv:1811.09393. doi:10.1145/3386569.3392457. ISSN 0730-0301. S2CID 209460786.
  39. ^ Xue, Tianfan; Chen, Baian; Wu, Jiajun; Wei, Donglai; Freeman, William T. (2019-02-12). "Video Enhancement with Task-Oriented Flow". International Journal of Computer Vision. Springer Science and Business Media LLC. 127 (8): 1106–1125. arXiv:1711.09078. doi:10.1007/s11263-018-01144-2. ISSN 0920-5691. S2CID 40412298.
  40. ^ Wang, Zhongyuan; Yi, Peng; Jiang, Kui; Jiang, Junjun; Han, Zhen; Lu, Tao; Ma, Jiayi (2019). "Multi-Memory Convolutional Neural Network for Video Super-Resolution". IEEE Transactions on Image Processing. Institute of Electrical and Electronics Engineers (IEEE). 28 (5): 2530–2544. Bibcode:2019ITIP...28.2530W. doi:10.1109/tip.2018.2887017. ISSN 1057-7149. PMID 30571634. S2CID 58595890.
  41. ^ Haris, Muhammad; Shakhnarovich, Gregory; Ukita, Norimichi (2019). "Recurrent Back-Projection Network for Video Super-Resolution". 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. pp. 3892–3901. arXiv:1903.10128. doi:10.1109/cvpr.2019.00402. ISBN 978-1-7281-3293-8.
  42. ^ Bao, Wenbo; Lai, Wei-Sheng; Zhang, Xiaoyun; Gao, Zhiyong; Yang, Ming-Hsuan (2021-03-01). "MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement". IEEE Transactions on Pattern Analysis and Machine Intelligence. Institute of Electrical and Electronics Engineers (IEEE). 43 (3): 933–948. arXiv:1810.08768. doi:10.1109/tpami.2019.2941941. ISSN 0162-8828. PMID 31722471. S2CID 53046739.
  43. ^ Bare, Bahetiyaer; Yan, Bo; Ma, Chenxi; Li, Ke (2019). "Real-time video super-resolution via motion convolution kernel estimation". Neurocomputing. Elsevier BV. 367: 236–245. doi:10.1016/j.neucom.2019.07.089. ISSN 0925-2312. S2CID 201264266.
  44. ^ Kalarot, Ratheesh; Porikli, Fatih (2019). "MultiBoot Vsr: Multi-Stage Multi-Reference Bootstrapping for Video Super-Resolution". 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE. pp. 2060–2069. doi:10.1109/cvprw.2019.00258. ISBN 978-1-7281-2506-0.
  45. ^ a b Chan, Kelvin C. K.; Wang, Xintao; Yu, Ke; Dong, Chao; Loy, Chen Change (2020-12-03). "BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond". arXiv:2012.02181v1 [cs.CV].
  46. ^ Naoto Chiche, Benjamin; Frontera-Pons, Joana; Woiselle, Arnaud; Starck, Jean-Luc (2020-11-09). "Deep Unrolled Network for Video Super-Resolution". 2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA). IEEE. pp. 1–6. arXiv:2102.11720. doi:10.1109/ipta50016.2020.9286636. ISBN 978-1-7281-8750-1.
  47. ^ Wang, Xintao; Chan, Kelvin C. K.; Yu, Ke; Dong, Chao; Loy, Chen Change (2019-05-07). "EDVR: Video Restoration with Enhanced Deformable Convolutional Networks". arXiv:1905.02716v1 [cs.CV].
  48. ^ Wang, Hua; Su, Dewei; Liu, Chuangchuang; Jin, Longcun; Sun, Xianfang; Peng, Xinyi (2019). "Deformable Non-Local Network for Video Super-Resolution". IEEE Access. Institute of Electrical and Electronics Engineers (IEEE). 7: 177734–177744. arXiv:1909.10692. doi:10.1109/access.2019.2958030. ISSN 2169-3536.
  49. ^ Tian, Yapeng; Zhang, Yulun; Fu, Yun; Xu, Chenliang (2020). "TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution". 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. pp. 3357–3366. arXiv:1812.02898. doi:10.1109/cvpr42600.2020.00342. ISBN 978-1-7281-7168-5.
  50. ^ Song, Huihui; Xu, Wenjie; Liu, Dong; Liua, Bo; Liub, Qingshan; Metaxas, Dimitris N. (2021). "Multi-Stage Feature Fusion Network for Video Super-Resolution". IEEE Transactions on Image Processing. Institute of Electrical and Electronics Engineers (IEEE). 30: 2923–2934. Bibcode:2021ITIP...30.2923S. doi:10.1109/tip.2021.3056868. ISSN 1057-7149. PMID 33560986. S2CID 231864067.
  51. ^ Isobe, Takashi; Li, Songjiang; Jia, Xu; Yuan, Shanxin; Slabaugh, Gregory; Xu, Chunjing; Li, Ya-Li; Wang, Shengjin; Tian, Qi (2020). "Video Super-Resolution With Temporal Group Attention". 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. pp. 8005–8014. arXiv:2007.10595. doi:10.1109/cvpr42600.2020.00803. ISBN 978-1-7281-7168-5.
  52. ^ Lucas, Alice; Lopez-Tapia, Santiago; Molina, Rafael; Katsaggelos, Aggelos K. (2019). "Generative Adversarial Networks and Perceptual Losses for Video Super-Resolution". IEEE Transactions on Image Processing. Institute of Electrical and Electronics Engineers (IEEE). 28 (7): 3312–3327. arXiv:1806.05764. Bibcode:2019ITIP...28.3312L. doi:10.1109/tip.2019.2895768. ISSN 1057-7149. PMID 30714918. S2CID 73415655.
  53. ^ Yan, Bo; Lin, Chuming; Tan, Weimin (2019-09-28). "Frame and Feature-Context Video Super-Resolution". arXiv:1909.13057v1 [cs.CV].
  54. ^ Tian, Zhiqiang; Wang, Yudiao; Du, Shaoyi; Lan, Xuguang (2020-07-10). Yang, You (ed.). "A multiresolution mixture generative adversarial network for video super-resolution". PLOS ONE. Public Library of Science (PLoS). 15 (7): e0235352. Bibcode:2020PLoSO..1535352T. doi:10.1371/journal.pone.0235352. ISSN 1932-6203. PMC 7351143. PMID 32649694.
  55. ^ Zhu, Xiaobin; Li, Zhuangzi; Lou, Jungang; Shen, Qing (2021). "Video super-resolution based on a spatio-temporal matching network". Pattern Recognition. 110: 107619. Bibcode:2021PatRe.11007619Z. doi:10.1016/j.patcog.2020.107619. ISSN 0031-3203. S2CID 225285804.
  56. ^ Li, Wenbo; Tao, Xin; Guo, Taian; Qi, Lu; Lu, Jiangbo; Jia, Jiaya (2020-07-23). "MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution". arXiv:2007.11803v1 [cs.CV].
  57. ^ Jo, Younghyun; Oh, Seoung Wug; Kang, Jaeyeon; Kim, Seon Joo (2018). "Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation". 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. pp. 3224–3232. doi:10.1109/cvpr.2018.00340. ISBN 978-1-5386-6420-9.
  58. ^ Li, Sheng; He, Fengxiang; Du, Bo; Zhang, Lefei; Xu, Yonghao; Tao, Dacheng (2019-04-05). "Fast Spatio-Temporal Residual Network for Video Super-Resolution". arXiv:1904.02870v1 [cs.CV].
  59. ^ Kim, Soo Ye; Lim, Jeongyeon; Na, Taeyoung; Kim, Munchurl (2019). "Video Super-Resolution Based on 3D-CNNS with Consideration of Scene Change". 2019 IEEE International Conference on Image Processing (ICIP). pp. 2831–2835. doi:10.1109/ICIP.2019.8803297. ISBN 978-1-5386-6249-6. S2CID 202763112.
  60. ^ Luo, Jianping; Huang, Shaofei; Yuan, Yuan (2020). "Video Super-Resolution using Multi-scale Pyramid 3D Convolutional Networks". Proceedings of the 28th ACM International Conference on Multimedia. pp. 1882–1890. doi:10.1145/3394171.3413587. ISBN 9781450379885. S2CID 222278621.
  61. ^ Zhang, Dongyang; Shao, Jie; Liang, Zhenwen; Liu, Xueliang; Shen, Heng Tao (2020). "Multi-branch Networks for Video Super-Resolution with Dynamic Reconstruction Strategy". IEEE Transactions on Circuits and Systems for Video Technology. 31 (10): 3954–3966. doi:10.1109/TCSVT.2020.3044451. ISSN 1051-8215. S2CID 235057646.
  62. ^ Aksan, Emre; Hilliges, Otmar (2019-02-18). "STCN: Stochastic Temporal Convolutional Networks". arXiv:1902.06568v1 [cs.LG].
  63. ^ Huang, Yan; Wang, Wei; Wang, Liang (2018). "Video Super-Resolution via Bidirectional Recurrent Convolutional Networks". IEEE Transactions on Pattern Analysis and Machine Intelligence. 40 (4): 1015–1028. doi:10.1109/TPAMI.2017.2701380. ISSN 0162-8828. PMID 28489532. S2CID 136582.
  64. ^ Zhu, Xiaobin; Li, Zhuangzi; Zhang, Xiao-Yu; Li, Changsheng; Liu, Yaqi; Xue, Ziyu (2019). "Residual Invertible Spatio-Temporal Network for Video Super-Resolution". Proceedings of the AAAI Conference on Artificial Intelligence. 33: 5981–5988. doi:10.1609/aaai.v33i01.33015981. ISSN 2374-3468.
  65. ^ Li, Dingyi; Liu, Yu; Wang, Zengfu (2019). "Video Super-Resolution Using Non-Simultaneous Fully Recurrent Convolutional Network". IEEE Transactions on Image Processing. 28 (3): 1342–1355. Bibcode:2019ITIP...28.1342L. doi:10.1109/TIP.2018.2877334. ISSN 1057-7149. PMID 30346282. S2CID 53044490.
  66. ^ Isobe, Takashi; Zhu, Fang; Jia, Xu; Wang, Shengjin (2020-08-13). "Revisiting Temporal Modeling for Video Super-resolution". arXiv:2008.05765v2 [eess.IV].
  67. ^ Han, Lei; Fan, Cien; Yang, Ye; Zou, Lian (2020). "Bidirectional Temporal-Recurrent Propagation Networks for Video Super-Resolution". Electronics. 9 (12): 2085. doi:10.3390/electronics9122085. ISSN 2079-9292.
  68. ^ Fuoli, Dario; Gu, Shuhang; Timofte, Radu (2019-09-17). "Efficient Video Super-Resolution through Recurrent Latent Space Propagation". arXiv:1909.08080 [eess.IV].
  69. ^ Isobe, Takashi; Jia, Xu; Gu, Shuhang; Li, Songjiang; Wang, Shengjin; Tian, Qi (2020-08-02). "Video Super-Resolution with Recurrent Structure-Detail Network". arXiv:2008.00455v1 [cs.CV].
  70. ^ Zhou, Chao; Chen, Can; Ding, Fei; Zhang, Dengyin (2021). "Video super‐resolution with non‐local alignment network". IET Image Processing. 15 (8): 1655–1667. doi:10.1049/ipr2.12134. ISSN 1751-9659.
  71. ^ Yi, Peng; Wang, Zhongyuan; Jiang, Kui; Jiang, Junjun; Lu, Tao; Ma, Jiayi (2020). "A Progressive Fusion Generative Adversarial Network for Realistic and Consistent Video Super-Resolution". IEEE Transactions on Pattern Analysis and Machine Intelligence. PP (5): 2264–2280. doi:10.1109/TPAMI.2020.3042298. ISSN 0162-8828. PMID 33270559. S2CID 227282569.
  72. ^ "MSU VSR Benchmark Methodology". Video Processing. 2021-04-26. Retrieved 2021-05-12.
  73. ^ Zvezdakova, A. V.; Kulikov, D. L.; Zvezdakov, S. V.; Vatolin, D. S. (2020). "BSQ-rate: a new approach for video-codec performance comparison and drawbacks of current solutions". Programming and Computer Software. 46 (3): 183–194. doi:10.1134/S0361768820030111. S2CID 219157416.
  74. ^ "See Better and Further with Super Res Zoom on the Pixel 3". Google AI Blog. 2018-10-15.

video, super, resolution, this, article, about, video, frame, restoration, technique, video, upscaling, tool, nvidia, video, super, resolution, process, generating, high, resolution, video, frames, from, given, resolution, video, frames, unlike, single, image,. This article is about video frame restoration technique For video upscaling tool by Nvidia see Video Super Resolution Video super resolution VSR is the process of generating high resolution video frames from the given low resolution video frames Unlike single image super resolution SISR the main goal is not only to restore more fine details while saving coarse ones but also to preserve motion consistency VSR and SISR methods outputs comparison VSR restores more details by using temporal information There are many approaches for this task but this problem still remains to be popular and challenging Contents 1 Mathematical explanation 2 Methods 2 1 Traditional methods 2 1 1 Frequency domain 2 1 2 Spatial domain 2 2 Deep learning based methods 2 2 1 Aligned by motion estimation and motion compensation 2 2 2 Aligned by deformable convolution 2 2 3 Aligned by homography 2 2 4 Spatial non aligned 2 2 5 3D convolutions 2 2 6 Recurrent neural networks 2 2 7 Videos 3 Metrics 4 Datasets 5 Benchmarks 5 1 NTIRE 2019 Challenge 5 2 Youku VESR Challenge 2019 5 3 AIM 2019 Challenge 5 4 AIM 2020 Challenge 5 5 MSU Video Super Resolution Benchmark 5 6 MSU Super Resolution for Video Compression Benchmark 6 Application 7 See also 8 ReferencesMathematical explanation EditMost research considers the degradation process of frames as y x k s n displaystyle y x k downarrow s n where x displaystyle x original high resolution frame sequence k displaystyle k blur kernel displaystyle convolution operation s displaystyle downarrow s downscaling operation n displaystyle n additive noise y displaystyle y low resolution frame sequence Super resolution is an inverse operation so its problem is to estimate frame sequence x displaystyle overline x from frame sequence y displaystyle y so that x displaystyle overline x is close to original x displaystyle x Blur kernel downscaling operation and additive noise should be estimated for given input to achieve better results Video super resolution approaches tend to have more components than the image counterparts as they need to exploit the additional temporal dimension Complex designs are not uncommon Some most essential components for VSR are guided by four basic functionalities Propagation Alignment Aggregation and Upsampling 1 Propagation refers to the way in which features are propagated temporally Alignment concerns on the spatial transformation applied to misaligned images features Aggregation defines the steps to combine aligned features Upsampling describes the method to transform the aggregated features to the final output imageMethods EditWhen working with video temporal information could be used to improve upscaling quality Single image super resolution methods could be used too generating high resolution frames independently from their neighbours but it s less effective and introduces temporal instability There are a few traditional methods which consider the video super resolution task as an optimization problem Last years deep learning based methods for video upscaling outperform traditional ones Traditional methods Edit There are several traditional methods for video upscaling These methods try to use some natural preferences and effectively estimate motion between frames The high resolution frame is reconstructed based on both natural preferences and estimated motion Frequency domain Edit Firstly the low resolution frame is transformed to the frequency domain The high resolution frame is estimated in this domain Finally this result frame is transformed to the spatial domain Some methods use Fourier transform which helps to extend the spectrum of captured signal and though increase resolution There are different approaches for these methods using weighted least squares theory 2 total least squares TLS algorithm 3 space varying 4 or spatio temporal 5 varying filtering Other methods use wavelet transform which helps to find similarities in neighboring local areas 6 Later second generation wavelet transform was used for video super resolution 7 Spatial domain Edit Iterative back projection methods assume some function between low resolution and high resolution frames and try to improve their guessed function in each step of an iterative process 8 Projections onto convex sets POCS that defines a specific cost function also can be used for iterative methods 9 Iterative adaptive filtering algorithms use Kalman filter to estimate transformation from low resolution frame to high resolution one 10 To improve the final result these methods consider temporal correlation among low resolution sequences Some approaches also consider temporal correlation among high resolution sequence 11 To approximate Kalman filter a common way is to use least mean squares LMS 12 One can also use steepest descent 13 least squares LS 14 recursive least squares RLS 14 Direct methods estimate motion between frames upscale a reference frame and warp neighboring frames to the high resolution reference one To construct result these upscaled frames are fused together by median filter 15 weighted median filter 16 adaptive normalized averaging AdaBoost classifier 17 or SVD based filters 18 Non parametric algorithms join motion estimation and frames fusion to one step It is performed by consideration of patches similarities Weights for fusion can be calculated by nonlocal means filters 19 To strength searching for similar patches one can use rotation invariance similarity measure 20 or adaptive patch size 21 Calculating intra frame similarity help to preserve small details and edges 22 Parameters for fusion also can be calculated by kernel regression 23 Probabilistic methods use statistical theory to solve the task maximum likelihood ML methods estimate more probable image 24 25 Another group of methods use maximum a posteriori MAP estimation Regularization parameter for MAP can be estimated by Tikhonov regularization 26 Markov random fields MRF is often used along with MAP and helps to preserve similarity in neighboring patches 27 Huber MRFs are used to preserve sharp edges 28 Gaussian MRF can smooth some edges but remove noise 29 Deep learning based methods Edit Aligned by motion estimation and motion compensation Edit In approaches with alignment neighboring frames are firstly aligned with target one One can align frames by performing motion estimation and motion compensation MEMC or by using Deformable convolution DC Motion estimation gives information about the motion of pixels between frames motion compensation is a warping operation which aligns one frame to another based on motion information Examples of such methods Deep DE 30 deep draft ensemble learning generates a series of SR feature maps and then process them together to estimate the final frame VSRnet 31 is based on SRCNN model for single image super resolution but takes multiple frames as input Input frames are first aligned by the Druleas algorithm VESPCN 32 uses a spatial motion compensation transformer module MCT which estimates and compensates motion Then a series of convolutions performed to extract feature and fuse them DRVSR 33 detail revealing deep video super resolution consists of three main steps motion estimation motion compensation and fusion The motion compensation transformer MCT is used for motion estimation The sub pixel motion compensation layer SPMC compensates motion Fusion step uses encoder decoder architecture and ConvLSTM module to unit information from both spatial and temporal dimensions RVSR 34 robust video super resolution have two branches one for spatial alignment and another for temporal adaptation The final frame is a weighted sum of branches output FRVSR 35 frame recurrent video super resolution estimate low resolution optical flow upsample it to high resolution and warp previous output frame by using this high resolution optical flow STTN 36 the spatio temporal transformer network estimate optical flow by U style network based on Unet and compensate motion by a trilinear interpolation method SOF VSR 37 super resolution optical flow for video super resolution calculate high resolution optical flow in coarse to fine manner Then the low resolution optical flow is estimated by a space to depth transformation The final super resolution result is gained from aligned low resolution frames TecoGAN 38 the temporally coherent GAN consists of generator and discriminator Generator estimates LR optical flow between consecutive frames and from this approximate HR optical flow to yield output frame The discriminator assesses the quality of the generator TOFlow 39 task oriented flow is a combination of optical flow network and reconstruction network Estimated optical flow is suitable for a particular task such as video super resolution MMCNN 40 the multi memory convolutional neural network aligns frames with target one and then generates the final HR result through the feature extraction detail fusion and feature reconstruction modules RBPN 41 the recurrent back projection network The input of each recurrent projection module features from the previous frame features from the consequence of frames and optical flow between neighboring frames MEMC Net 42 the motion estimation and motion compensation network uses both motion estimation network and kernel estimation network to warp frames adaptively RTVSR 43 real time video super resolution aligns frames with estimated convolutional kernel MultiBoot VSR 44 the multi stage multi reference bootstrapping method aligns frames and then have two stage of SR reconstruction to improve quality BasicVSR 45 aligns frames with optical flow and then fuse their features in a recurrent bidirectional scheme IconVSR 45 is a refined version of BasicVSR with a recurrent coupled propagation scheme UVSR 46 unrolled network for video super resolution adapted unrolled optimization algorithms to solve the VSR problemAligned by deformable convolution Edit Another way to align neighboring frames with target one is deformable convolution While usual convolution has fixed kernel deformable convolution on the first step estimate shifts for kernel and then do convolution Examples of such methods EDVR 47 The enhanced deformable video restoration can be divided into two main modules the pyramid cascading and deformable PCD module for alignment and the temporal spatial attention TSA module for fusion DNLN 48 The deformable non local network has alignment module based on deformable convolution with the hierarchical feature fusion module HFFB for better quality and non local attention module TDAN 49 The temporally deformable alignment network consists of an alignment module and a reconstruction module Alignment performed by deformable convolution based on feature extraction and alignment Multi Stage Feature Fusion Network 50 for Video Super Resolution uses the multi scale dilated deformable convolution for frame alignment and the Modulative Feature Fusion Branch to integrate aligned framesAligned by homography Edit Some methods align frames by calculated homography between frames TGA 51 Temporal Group Attention divide input frames to N groups dependent on time difference and extract information from each group independently Fast Spatial Alignment module based on homography used to align framesSpatial non aligned Edit Methods without alignment do not perform alignment as a first step and just process input frames VSRResNet 52 like GAN consists of generator and discriminator Generator upsamples input frames extracts features and fuses them Discriminator assess the quality of result high resolution frames FFCVSR 53 frame and feature context video super resolution takes unaligned low resolution frames and output high resolution previous frames to simultaneously restore high frequency details and maintain temporal consistency MRMNet 54 the multi resolution mixture network consists of three modules bottleneck exchange and residual Bottleneck unit extract features that have the same resolution as input frames Exchange module exchange features between neighboring frames and enlarges feature maps Residual module extract features after exchange one STMN 55 the spatio temporal matching network use discrete wavelet transform to fuse temporal features Non local matching block integrates super resolution and denoising At the final step SR result is got on the global wavelet domain MuCAN 56 the multi correspondence aggregation network uses temporal multi correspondence strategy to fuse temporal features and cross scale nonlocal correspondence to extract self similarities in frames3D convolutions Edit While 2D convolutions work on spatial domain 3D convolutions use both spatial and temporal information They perform motion compensation and maintain temporal consistency DUF 57 the dynamic upsampling filters uses deformable 3D convolution for motion compensation The model estimates kernels for specific input frames FSTRN 58 The fast spatio temporal residual network includes a few modules LR video shallow feature extraction net LFENet LR feature fusion and up sampling module LSRNet and two residual modules spatio temporal and global 3DSRnet 59 The 3D super resolution network uses 3D convolutions to extract spatio temporal information Model also has a special approach for frames where scene change is detected MP3D 60 the multi scale pyramid 3D convolutional network uses 3D convolution to extract spatial and temporal features simultaneously which then passed through reconstruction module with 3D sub pixel convolution for upsampling DMBN 61 the dynamic multiple branch network has three branches to exploit information from multiple resolutions Finally information from branches fuse dynamicallyRecurrent neural networks Edit Recurrent convolutional neural networks perform video super resolution by storing temporal dependencies STCN 62 the spatio temporal convolutional network extract features in the spatial module pass them through the recurrent temporal module and final reconstruction module Temporal consistency is maintained by long short term memory LSTM mechanism BRCN 63 the bidirectional recurrent convolutional network has two subnetworks with forward fusion and backward fusion The result of the network is a composition of two branches output RISTN 64 the residual invertible spatio temporal network consists of spatial temporal and reconstruction module Spatial module composed of residual invertible blocks RIB which extract spatial features effectively The output of the spatial module is processed by the temporal module which extracts spatio temporal information and then fuses important features The final result is calculated in the reconstruction module by deconvolution operation RRCN 65 the residual recurrent convolutional network is a bidirectional recurrent network which calculates a residual image Then the final result is gained by adding a bicubically upsampled input frame RRN 66 the recurrent residual network uses a recurrent sequence of residual blocks to extract spatial and temporal information BTRPN 67 the bidirectional temporal recurrent propagation network use bidirectional recurrent scheme Final result combined from two branches with channel attention mechanism RLSP 68 recurrent latent state propagation fully convolutional network cell with highly efficient propagation of temporal information through a hidden state RSDN 69 the recurrent structure detail network divide input frame into structure and detail components and process them in two parallel streamsVideos Edit Non local methods extract both spatial and temporal information The key idea is to use all possible positions as a weighted sum This strategy may be more effective than local approaches the progressive fusion non local method extract spatio temporal features by non local residual blocks then fuse them by progressive fusion residual block PFRB The result of these blocks is a residual image The final result is gained by adding bicubically upsampled input frame NLVSR 70 the novel video super resolution network aligns frames with target one by temporal spatial non local operation To integrate information from aligned frames an attention based mechanism is used MSHPFNL 71 also incorporates multi scale structure and hybrid convolutions to extract wide range dependencies To avoid some artifacts like flickering or ghosting they use generative adversarial trainingMetrics Edit Top Original sequence bottom PSNR Peak signal to noise ratio visualization of the output of a VSR methodThe common way to estimate the performance of video super resolution algorithms is to use a few metrics PSNR Peak signal noise ratio calculates the difference between two corresponding frames based on mean squared error MSE SSIM Structural similarity index measures the similarity of structure between two corresponding frames IFC Information Fidelity Criterion shows information similarity with the reference frame MOVIE Motion based Video Integrity Evaluation index integrates explicit motion information by estimating distortions along motion trajectories VMAF Video Multimethod Assessment Fusion predicts subjective video quality based on a reference and distorted video sequence VIF Visual Information Fidelity is a full reference image quality assessment index based on natural scene statistics and the notion of image information extracted by the human visual system LPIPS Learned Perceptual Image Patch Similarity compares the perceptual similarity of frames based on high order image structure tOF measures pixel wise motion similarity with reference frame based on optical flow tLP calculates how LPIPS changes from frame to frame in comparison with the reference sequence FSIM Feature Similarity Index for Image Quality uses phase congruency as the primary feature to measure the similarity between two corresponding frames Currently there aren t so many objective metrics to verify video super resolution method s ability to restore real details Research is currently underway in this area Another way to assess the performance of the video super resolution algorithm is to organize the subjective evaluation People are asked to compare the corresponding frames and the final mean opinion score MOS is calculated as the arithmetic mean overall ratings Datasets EditWhile deep learning approaches of video super resolution outperform traditional ones it s crucial to form a high quality dataset for evaluation It s important to verify models ability to restore small details text and objects with complicated structure to cope with big motion and noise Comparison of datasets Dataset Videos Mean video length Ground truth resolution Motion in frames Fine detailsVid4 4 43 frames 720 480 Without fast motion Some small details without textSPMCS 30 31 frames 960 540 SLow motion A lot of small detailsVimeo 90K test SR set 7824 7 frames 448 256 A lot of fast difficult diverse motion Few details text in a few sequencesXiph HD complete sets 70 2 seconds from 640 360 to 4096 2160 A lot of fast difficult diverse motion Few details text in a few sequencesUltra Video Dataset 4K 16 10 seconds 4096 2160 Diverse motion Few details without textREDS test SR 30 100 frames 1280 720 A lot of fast difficult diverse motion Few details without textSpace Time SR 5 100 frames 1280 720 Diverse motion Without small details and textHarmonic 4096 2160 CDVL 1920 1080 Benchmarks EditA few benchmarks in video super resolution were organized by companies and conferences The purposes of such challenges are to compare diverse algorithms and to find the state of the art for the task Comparison of benchmarks Benchmark Organizer Dataset Upscale factor MetricsNTIRE 2019 Challenge CVPR Computer Vision and pattern recognition REDS 4 PSNR SSIMYouku VESR Challenge 2019 Youku Youku VESR 4 PSNR VMAFAIM 2019 Challenge ECCV European Conference on Computer Vision Vid3oC 16 PSNR SSIM MOSAIM 2020 Challenge ECCV European Conference on Computer Vision Vid3oC 16 PSNR SSIM LPIPSMobile Video Restoration Challenge ICIP International Conference of Image Processing Kwai PSNR SSIM MOSMSU Video Super Resolution Benchmark 2021 MSU Moscow State University 4 ERQAv1 0 PSNR and SSIM with shift compensation QRCRv1 0 CRRMv1 0MSU Super Resolution for Video Compression Benchmark 2022 MSU Moscow State University 4 ERQAv2 0 PSNR MS SSIM VMAF LPIPSNTIRE 2019 Challenge Edit The NTIRE 2019 Challenge was organized by CVPR and proposed two tracks for Video Super Resolution clean only bicubic degradation and blur blur added firstly Each track had more than 100 participants and 14 final results were submitted Dataset REDS was collected for this challenge It consists of 30 videos of 100 frames each The resolution of ground truth frames is 1280 720 The tested scale factor is 4 To evaluate models performance PSNR and SSIM were used The best participants results are performed in the table Top teams Team Model name PSNR clean track SSIM clean track PSNR blur track SSIM blur track Runtime per image in sec clean track Runtime per image in sec blur track Platform GPU Open sourceHelloVSR EDVR 31 79 0 8962 30 17 0 8647 2 788 3 562 PyTorch TITAN Xp YESUIUC IFP WDVR 30 81 0 8748 29 46 0 8430 0 980 0 980 PyTorch Tesla V100 YESSuperRior ensemble of RDN RCAN DUF 31 13 0 8811 120 000 PyTorch Tesla V100 NOCyberverseSanDiego RecNet 31 00 0 8822 27 71 0 8067 3 000 3 000 TensorFlow RTX 2080 Ti YESTTI RBPN 30 97 0 8804 28 92 0 8333 1 390 1 390 PyTorch TITAN X YESNERCMS PFNL 30 91 0 8782 28 98 0 8307 6 020 6 020 PyTorch GTX 1080 Ti YESXJTU IAIR FSTDN 28 86 0 8301 13 000 PyTorch GTX 1080 Ti NOYouku VESR Challenge 2019 Edit The Youku VESR Challenge was organized to check models ability to cope with degradation and noise which are real for Youku online video watching application The proposed dataset consists of 1000 videos each length is 4 6 seconds The resolution of ground truth frames is 1920 1080 The tested scale factor is 4 PSNR and VMAF metrics were used for performance evaluation Top methods are performed in the table Top teams Team PSNR VMAFAvengers Assemble 37 851 41 617NJU L1 37 681 41 227ALONG NTES 37 632 40 405AIM 2019 Challenge Edit The challenge was held by ECCV and had two tracks on video extreme super resolution first track checks the fidelity with reference frame measured by PSNR and SSIM The second track checks the perceptual quality of videos MOS Dataset consists of 328 video sequences of 120 frames each The resolution of ground truth frames is 1920 1080 The tested scale factor is 16 Top methods are performed in the table Top teams Team Model name PSNR SSIM MOS Runtime per image in sec Platform GPU CPU Open sourcefenglinglwb based on EDVR 22 53 0 64 first result 0 35 PyTorch 4 Titan X NONERCMS PFNL 22 35 0 63 0 51 PyTorch 2 1080 Ti NObaseline RLSP 21 75 0 60 0 09 TensorFlow Titan Xp NOHIT XLab based on EDSR 21 45 0 60 second result 60 00 PyTorch V100 NOAIM 2020 Challenge Edit Challenge s conditions are the same as AIM 2019 Challenge Top methods are performed in the table Top teams Team Model name Params number PSNR SSIM Runtime per image in sec GPU CPU Open sourceKirinUK EVESRNet 45 29M 22 83 0 6450 6 1 s 1 2080Ti 6 NOTeam WVU 29 51M 22 48 0 6378 4 9 s 1 TitanXp NOBOE IOT AIBD 3D MGBP 53M 22 48 0 6304 4 83 s 1 1080 NOsr xxx based on EDVR 22 43 0 6353 4 s 1 V100 NOZZX MAHA 31 14M 22 28 0 6321 4 s 1 1080Ti NOlyl FineNet 22 08 0 6256 13 s NOTTI based on STARnet 21 91 0 6165 0 249 s NOCET CVLab 21 77 0 6112 0 04 s 1 P100 NOMSU Video Super Resolution Benchmark Edit The MSU Video Super Resolution Benchmark was organized by MSU and proposed three types of motion two ways to lower resolution and eight types of content in the dataset The resolution of ground truth frames is 1920 1280 The tested scale factor is 4 14 models were tested To evaluate models performance PSNR and SSIM were used with shift compensation Also proposed a few new metrics ERQAv1 0 QRCRv1 0 and CRRMv1 0 72 Top methods are performed in the table Top methods Model name Multi frame Subjective ERQAv1 0 PSNR SSIM QRCRv1 0 CRRMv1 0 Runtime per image in sec Open sourceDBVSR YES 5 561 0 737 31 071 0 894 0 629 0 992 YESLGFN YES 5 040 0 740 31 291 0 898 0 629 0 996 1 499 YESDynaVSR R YES 4 751 0 709 28 377 0 865 0 557 0 997 5 664 YESTDAN YES 4 036 0 706 30 244 0 883 0 557 0 994 YESDUF 28L YES 3 910 0 645 25 852 0 830 0 549 0 993 2 392 YESRRN 10L YES 3 887 0 627 24 252 0 790 0 557 0 989 0 390 YESRealSR NO 3 749 0 690 25 989 0 767 0 000 0 886 YESMSU Super Resolution for Video Compression Benchmark Edit The MSU Super Resolution for Video Compression Benchmark was organized by MSU This benchmark tests models ability to work with compressed videos The dataset consists of 9 videos compressed with different Video codec standards and different bitrates Models are ranked by BSQ rate 73 over subjective score The resolution of ground truth frames is 1920 1080 The tested scale factor is 4 17 models were tested 5 video codecs were used to compress ground truth videos Top combinations of Super Resolution methods and video codecs are performed in the table Top methods Model name BSQ rate Subjective score BSQ rate ERQAv2 0 BSQ rate VMAF BSQ rate PSNR BSQ rate MS SSIM BSQ rate LPIPS Open sourceRealSR x264 0 196 0 770 0 775 0 675 0 487 0 591 YESahq 11 x264 0 271 0 883 0 753 0 873 0 719 0 656 NOSwinIR x264 0 304 0 760 0 642 6 268 0 736 0 559 YESReal ESRGAN x264 0 335 5 580 0 698 7 874 0 881 0 733 YESSwinIR x265 0 346 1 575 1 304 8 130 4 641 1 474 YESCOMISR x264 0 367 0 969 1 302 6 081 0 672 1 118 YESRealSR x265 0 502 1 622 1 617 1 064 1 033 1 206 YESApplication EditIn many areas working with video we deal with different types of video degradation including downscaling The resolution of video can be degraded because of imperfections of measuring devices such as optical degradations and limited size of camera sensors Bad light and weather conditions add noise to video Object and camera motion also decrease video quality Super Resolution techniques help to restore the original video It s useful in a wide range of applications such as video surveillance to improve video captured from the camera and recognize car numbers and faces medical imaging to discover better some organs or tissues for clinical analysis and medical intervention forensic science to help in the investigation during the criminal procedure astronomy to improve quality of video of stars and planets remote sensing to alleviate observation of an object microscopy to strength microscopes ability It also helps to solve task of object detection face and character recognition as preprocessing step The interest to super resolution is growing with the development of high definition computer displays and TVs Simulating the natural hand movements by jiggling the cameraVideo super resolution finds its practical use in some modern smartphones and cameras where it is used to reconstruct digital photographs Reconstructing details on digital photographs is a difficult task since these photographs are already incomplete the camera sensor elements measure only the intensity of the light not directly its color A process called demosaicing is used to reconstruct the photos from partial color information A single frame doesn t give us enough data to fill in the missing colors however we can receive some of the missing information from multiple images taken one after the other This process is known as burst photography and can be used to restore a single image of good quality from multiple sequential frames When we capture a lot of sequential photos with a smartphone or handheld camera there is always some movement present between the frames because of the hand motion We can take advantage of this hand tremor by combining the information on those images We choose a single image as the base or reference frame and align every other frame relative to it There are situations where hand motion is simply not present because the device is stabilized e g placed on a tripod There is a way to simulate natural hand motion by intentionally slightly moving the camera The movements are extremely small so they don t interfere with regular photos You can observe these motions on Google Pixel 3 74 phone by holding it perfectly still e g pressing it against the window and maximally pinch zooming the viewfinder See also EditSuper resolution imaging Image resolution High definition video Display resolution Ultra high definition television Oversampling High dynamic range videoReferences Edit Chan Kelvin CK et al BasicVSR The search for essential components in video super resolution and beyond Proceedings of the IEEE CVF Conference on Computer Vision and Pattern Recognition 2021 Kim S P Bose N K Valenzuela H M 1989 Reconstruction of high resolution image from noise undersampled frames Lecture Notes in Control and Information Sciences Vol 129 Berlin Heidelberg Springer Verlag pp 315 326 doi 10 1007 bfb0042742 ISBN 3 540 51424 4 Bose N K Kim H C Zhou B 1994 Performance analysis of the TLS algorithm for image reconstruction from a sequence of undersampled noisy and blurred frames Proceedings of 1st International Conference on Image Processing Vol 3 IEEE Comput Soc Press pp 571 574 doi 10 1109 icip 1994 413741 ISBN 0 8186 6952 7 Tekalp A M Ozkan M K Sezan M I 1992 High resolution image reconstruction from lower resolution image sequences and space varying image restoration Proceedings ICASSP 92 1992 IEEE International Conference on Acoustics Speech and Signal Processing IEEE pp 169 172 vol 3 doi 10 1109 icassp 1992 226249 ISBN 0 7803 0532 9 Goldberg N Feuer A Goodwin G C 2003 Super resolution reconstruction using spatio temporal filtering Journal of Visual Communication and Image Representation Elsevier BV 14 4 508 525 doi 10 1016 s1047 3203 03 00042 7 ISSN 1047 3203 Mallat S 2010 Super Resolution With Sparse Mixing Estimators IEEE Transactions on Image Processing Institute of Electrical and Electronics Engineers IEEE 19 11 2889 2900 Bibcode 2010ITIP 19 2889M doi 10 1109 tip 2010 2049927 ISSN 1057 7149 PMID 20457549 S2CID 856101 Bose N K Lertrattanapanich S Chappalli M B 2004 Superresolution with second generation wavelets Signal Processing Image Communication Elsevier BV 19 5 387 391 doi 10 1016 j image 2004 02 001 ISSN 0923 5965 Cohen B Avrin V Dinstein I 2000 Polyphase back projection filtering for resolution enhancement of image sequences 2000 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings Cat No 00CH37100 Vol 4 IEEE pp 2171 2174 doi 10 1109 icassp 2000 859267 ISBN 0 7803 6293 4 Katsaggelos A K 1997 An iterative weighted regularized algorithm for improving the resolution of video sequences Proceedings of International Conference on Image Processing IEEE Comput Soc pp 474 477 doi 10 1109 icip 1997 638811 ISBN 0 8186 8183 7 Farsiu Sina Elad Michael Milanfar Peyman 2006 01 15 A practical approach to superresolution In Apostolopoulos John G Said Amir eds Visual Communications and Image Processing 2006 Vol 6077 SPIE p 607703 doi 10 1117 12 644391 Jing Tian Kai Kuang Ma 2005 A new state space approach for super resolution image sequence reconstruction IEEE International Conference on Image Processing 2005 IEEE pp I 881 doi 10 1109 icip 2005 1529892 ISBN 0 7803 9134 9 Costa Guilherme Holsbach Bermudez Jos Carlos Moreira 2007 Statistical Analysis of the LMS Algorithm Applied to Super Resolution Image Reconstruction IEEE Transactions on Signal Processing Institute of Electrical and Electronics Engineers IEEE 55 5 2084 2095 Bibcode 2007ITSP 55 2084C doi 10 1109 tsp 2007 892704 ISSN 1053 587X S2CID 52857681 Elad M Feuer A 1999 Super resolution reconstruction of continuous image sequences Proceedings 1999 International Conference on Image Processing Cat 99CH36348 Vol 3 IEEE pp 459 463 doi 10 1109 icip 1999 817156 ISBN 0 7803 5467 2 a b Elad M Feuer A 1999 Superresolution restoration of an image sequence adaptive filtering approach IEEE Transactions on Image Processing Institute of Electrical and Electronics Engineers IEEE 8 3 387 395 Bibcode 1999ITIP 8 387E doi 10 1109 83 748893 ISSN 1057 7149 PMID 18262881 Pickering M Frater M Arnold J 2005 Arobust approach to super resolution sprite generation IEEE International Conference on Image Processing 2005 IEEE pp I 897 doi 10 1109 icip 2005 1529896 ISBN 0 7803 9134 9 Nasonov Andrey V Krylov Andrey S 2010 Fast Super Resolution Using Weighted Median Filtering 2010 20th International Conference on Pattern Recognition IEEE pp 2230 2233 doi 10 1109 icpr 2010 546 ISBN 978 1 4244 7542 1 Simonyan K Grishin S Vatolin D Popov D 2008 Fast video super resolution via classification 2008 15th IEEE International Conference on Image Processing IEEE pp 349 352 doi 10 1109 icip 2008 4711763 ISBN 978 1 4244 1765 0 Nasir Haidawati Stankovic Vladimir Marshall Stephen 2011 Singular value decomposition based fusion for super resolution image reconstruction 2011 IEEE International Conference on Signal and Image Processing Applications ICSIPA IEEE pp 393 398 doi 10 1109 icsipa 2011 6144138 ISBN 978 1 4577 0242 6 Protter M Elad M Takeda H Milanfar P 2009 Generalizing the Nonlocal Means to Super Resolution Reconstruction IEEE Transactions on Image Processing Institute of Electrical and Electronics Engineers IEEE 18 1 36 51 Bibcode 2009ITIP 18 36P doi 10 1109 tip 2008 2008067 ISSN 1057 7149 PMID 19095517 S2CID 2142115 Zhuo Yue Liu Jiaying Ren Jie Guo Zongming 2012 Nonlocal based Super Resolution with rotation invariance and search window relocation 2012 IEEE International Conference on Acoustics Speech and Signal Processing ICASSP IEEE pp 853 856 doi 10 1109 icassp 2012 6288018 ISBN 978 1 4673 0046 9 Cheng Ming Hui Chen Hsuan Ying Leou Jin Jang 2011 Video super resolution reconstruction using a mobile search strategy and adaptive patch size Signal Processing Elsevier BV 91 5 1284 1297 doi 10 1016 j sigpro 2010 12 016 ISSN 0165 1684 S2CID 17920263 Huhle Benjamin Schairer Timo Jenke Philipp Strasser Wolfgang 2010 Fusion of range and color images for denoising and resolution enhancement with a non local filter Computer Vision and Image Understanding Elsevier BV 114 12 1336 1345 doi 10 1016 j cviu 2009 11 004 ISSN 1077 3142 Takeda Hiroyuki Farsiu Sina Milanfar Peyman 2007 Kernel Regression for Image Processing and Reconstruction IEEE Transactions on Image Processing Institute of Electrical and Electronics Engineers IEEE 16 2 349 366 Bibcode 2007ITIP 16 349T doi 10 1109 tip 2006 888330 ISSN 1057 7149 PMID 17269630 S2CID 12116009 Elad M Feuer A 1997 Restoration of a single superresolution image from several blurred noisy and undersampled measured images IEEE Transactions on Image Processing Institute of Electrical and Electronics Engineers IEEE 6 12 1646 1658 Bibcode 1997ITIP 6 1646E doi 10 1109 83 650118 ISSN 1057 7149 PMID 18285235 Farsiu Sina Robinson Dirk Elad Michael Milanfar Peyman 2003 11 20 Robust shift and add approach to superresolution In Tescher Andrew G ed Applications of Digital Image Processing XXVI Vol 5203 SPIE p 121 doi 10 1117 12 507194 Chantas G K Galatsanos N P Woods N A 2007 Super Resolution Based on Fast Registration and Maximum a Posteriori Reconstruction IEEE Transactions on Image Processing Institute of Electrical and Electronics Engineers IEEE 16 7 1821 1830 Bibcode 2007ITIP 16 1821C doi 10 1109 tip 2007 896664 ISSN 1057 7149 PMID 17605380 S2CID 1811280 Rajan D Chaudhuri S 2001 Generation of super resolution images from blurred observations using Markov random fields 2001 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings Cat No 01CH37221 Vol 3 IEEE pp 1837 1840 doi 10 1109 icassp 2001 941300 ISBN 0 7803 7041 4 Zibetti Marcelo Victor Wust Mayer Joceli 2006 Outlier Robust and Edge Preserving Simultaneous Super Resolution 2006 International Conference on Image Processing IEEE pp 1741 1744 doi 10 1109 icip 2006 312718 ISBN 1 4244 0480 0 Joshi M V Chaudhuri S Panuganti R 2005 A Learning Based Method for Image Super Resolution From Zoomed Observations IEEE Transactions on Systems Man and Cybernetics Part B Cybernetics Institute of Electrical and Electronics Engineers IEEE 35 3 527 537 doi 10 1109 tsmcb 2005 846647 ISSN 1083 4419 PMID 15971920 S2CID 3162908 Liao Renjie Tao Xin Li Ruiyu Ma Ziyang Jia Jiaya 2015 Video Super Resolution via Deep Draft Ensemble Learning 2015 IEEE International Conference on Computer Vision ICCV IEEE pp 531 539 doi 10 1109 iccv 2015 68 ISBN 978 1 4673 8391 2 Kappeler Armin Yoo Seunghwan Dai Qiqin Katsaggelos Aggelos K 2016 Video Super Resolution With Convolutional Neural Networks IEEE Transactions on Computational Imaging Institute of Electrical and Electronics Engineers IEEE 2 2 109 122 doi 10 1109 tci 2016 2532323 ISSN 2333 9403 S2CID 9356783 Caballero Jose Ledig Christian Aitken Andrew Acosta Alejandro Totz Johannes Wang Zehan Shi Wenzhe 2016 11 16 Real Time Video Super Resolution with Spatio Temporal Networks and Motion Compensation arXiv 1611 05250v2 cs CV Tao Xin Gao Hongyun Liao Renjie Wang Jue Jia Jiaya 2017 Detail Revealing Deep Video Super Resolution 2017 IEEE International Conference on Computer Vision ICCV IEEE pp 4482 4490 arXiv 1704 02738 doi 10 1109 iccv 2017 479 ISBN 978 1 5386 1032 9 Liu Ding Wang Zhaowen Fan Yuchen Liu Xianming Wang Zhangyang Chang Shiyu Huang Thomas 2017 Robust Video Super Resolution with Learned Temporal Dynamics 2017 IEEE International Conference on Computer Vision ICCV IEEE pp 2526 2534 doi 10 1109 iccv 2017 274 ISBN 978 1 5386 1032 9 Sajjadi Mehdi S M Vemulapalli Raviteja Brown Matthew 2018 Frame Recurrent Video Super Resolution 2018 IEEE CVF Conference on Computer Vision and Pattern Recognition IEEE pp 6626 6634 arXiv 1801 04590 doi 10 1109 cvpr 2018 00693 ISBN 978 1 5386 6420 9 Kim Tae Hyun Sajjadi Mehdi S M Hirsch Michael Scholkopf Bernhard 2018 Spatio Temporal Transformer Network for Video Restoration Computer Vision ECCV 2018 Cham Springer International Publishing pp 111 127 doi 10 1007 978 3 030 01219 9 7 ISBN 978 3 030 01218 2 ISSN 0302 9743 Wang Longguang Guo Yulan Liu Li Lin Zaiping Deng Xinpu An Wei 2020 Deep Video Super Resolution Using HR Optical Flow Estimation IEEE Transactions on Image Processing Institute of Electrical and Electronics Engineers IEEE 29 4323 4336 arXiv 2001 02129 Bibcode 2020ITIP 29 4323W doi 10 1109 tip 2020 2967596 ISSN 1057 7149 PMID 31995491 S2CID 210023539 Chu Mengyu Xie You Mayer Jonas Leal Taixe Laura Thuerey Nils 2020 07 08 Learning temporal coherence via self supervision for GAN based video generation ACM Transactions on Graphics Association for Computing Machinery ACM 39 4 arXiv 1811 09393 doi 10 1145 3386569 3392457 ISSN 0730 0301 S2CID 209460786 Xue Tianfan Chen Baian Wu Jiajun Wei Donglai Freeman William T 2019 02 12 Video Enhancement with Task Oriented Flow International Journal of Computer Vision Springer Science and Business Media LLC 127 8 1106 1125 arXiv 1711 09078 doi 10 1007 s11263 018 01144 2 ISSN 0920 5691 S2CID 40412298 Wang Zhongyuan Yi Peng Jiang Kui Jiang Junjun Han Zhen Lu Tao Ma Jiayi 2019 Multi Memory Convolutional Neural Network for Video Super Resolution IEEE Transactions on Image Processing Institute of Electrical and Electronics Engineers IEEE 28 5 2530 2544 Bibcode 2019ITIP 28 2530W doi 10 1109 tip 2018 2887017 ISSN 1057 7149 PMID 30571634 S2CID 58595890 Haris Muhammad Shakhnarovich Gregory Ukita Norimichi 2019 Recurrent Back Projection Network for Video Super Resolution 2019 IEEE CVF Conference on Computer Vision and Pattern Recognition CVPR IEEE pp 3892 3901 arXiv 1903 10128 doi 10 1109 cvpr 2019 00402 ISBN 978 1 7281 3293 8 Bao Wenbo Lai Wei Sheng Zhang Xiaoyun Gao Zhiyong Yang Ming Hsuan 2021 03 01 MEMC Net Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement IEEE Transactions on Pattern Analysis and Machine Intelligence Institute of Electrical and Electronics Engineers IEEE 43 3 933 948 arXiv 1810 08768 doi 10 1109 tpami 2019 2941941 ISSN 0162 8828 PMID 31722471 S2CID 53046739 Bare Bahetiyaer Yan Bo Ma Chenxi Li Ke 2019 Real time video super resolution via motion convolution kernel estimation Neurocomputing Elsevier BV 367 236 245 doi 10 1016 j neucom 2019 07 089 ISSN 0925 2312 S2CID 201264266 Kalarot Ratheesh Porikli Fatih 2019 MultiBoot Vsr Multi Stage Multi Reference Bootstrapping for Video Super Resolution 2019 IEEE CVF Conference on Computer Vision and Pattern Recognition Workshops CVPRW IEEE pp 2060 2069 doi 10 1109 cvprw 2019 00258 ISBN 978 1 7281 2506 0 a b Chan Kelvin C K Wang Xintao Yu Ke Dong Chao Loy Chen Change 2020 12 03 BasicVSR The Search for Essential Components in Video Super Resolution and Beyond arXiv 2012 02181v1 cs CV Naoto Chiche Benjamin Frontera Pons Joana Woiselle Arnaud Starck Jean Luc 2020 11 09 Deep Unrolled Network for Video Super Resolution 2020 Tenth International Conference on Image Processing Theory Tools and Applications IPTA IEEE pp 1 6 arXiv 2102 11720 doi 10 1109 ipta50016 2020 9286636 ISBN 978 1 7281 8750 1 Wang Xintao Chan Kelvin C K Yu Ke Dong Chao Loy Chen Change 2019 05 07 EDVR Video Restoration with Enhanced Deformable Convolutional Networks arXiv 1905 02716v1 cs CV Wang Hua Su Dewei Liu Chuangchuang Jin Longcun Sun Xianfang Peng Xinyi 2019 Deformable Non Local Network for Video Super Resolution IEEE Access Institute of Electrical and Electronics Engineers IEEE 7 177734 177744 arXiv 1909 10692 doi 10 1109 access 2019 2958030 ISSN 2169 3536 Tian Yapeng Zhang Yulun Fu Yun Xu Chenliang 2020 TDAN Temporally Deformable Alignment Network for Video Super Resolution 2020 IEEE CVF Conference on Computer Vision and Pattern Recognition CVPR IEEE pp 3357 3366 arXiv 1812 02898 doi 10 1109 cvpr42600 2020 00342 ISBN 978 1 7281 7168 5 Song Huihui Xu Wenjie Liu Dong Liua Bo Liub Qingshan Metaxas Dimitris N 2021 Multi Stage Feature Fusion Network for Video Super Resolution IEEE Transactions on Image Processing Institute of Electrical and Electronics Engineers IEEE 30 2923 2934 Bibcode 2021ITIP 30 2923S doi 10 1109 tip 2021 3056868 ISSN 1057 7149 PMID 33560986 S2CID 231864067 Isobe Takashi Li Songjiang Jia Xu Yuan Shanxin Slabaugh Gregory Xu Chunjing Li Ya Li Wang Shengjin Tian Qi 2020 Video Super Resolution With Temporal Group Attention 2020 IEEE CVF Conference on Computer Vision and Pattern Recognition CVPR IEEE pp 8005 8014 arXiv 2007 10595 doi 10 1109 cvpr42600 2020 00803 ISBN 978 1 7281 7168 5 Lucas Alice Lopez Tapia Santiago Molina Rafael Katsaggelos Aggelos K 2019 Generative Adversarial Networks and Perceptual Losses for Video Super Resolution IEEE Transactions on Image Processing Institute of Electrical and Electronics Engineers IEEE 28 7 3312 3327 arXiv 1806 05764 Bibcode 2019ITIP 28 3312L doi 10 1109 tip 2019 2895768 ISSN 1057 7149 PMID 30714918 S2CID 73415655 Yan Bo Lin Chuming Tan Weimin 2019 09 28 Frame and Feature Context Video Super Resolution arXiv 1909 13057v1 cs CV Tian Zhiqiang Wang Yudiao Du Shaoyi Lan Xuguang 2020 07 10 Yang You ed A multiresolution mixture generative adversarial network for video super resolution PLOS ONE Public Library of Science PLoS 15 7 e0235352 Bibcode 2020PLoSO 1535352T doi 10 1371 journal pone 0235352 ISSN 1932 6203 PMC 7351143 PMID 32649694 Zhu Xiaobin Li Zhuangzi Lou Jungang Shen Qing 2021 Video super resolution based on a spatio temporal matching network Pattern Recognition 110 107619 Bibcode 2021PatRe 11007619Z doi 10 1016 j patcog 2020 107619 ISSN 0031 3203 S2CID 225285804 Li Wenbo Tao Xin Guo Taian Qi Lu Lu Jiangbo Jia Jiaya 2020 07 23 MuCAN Multi Correspondence Aggregation Network for Video Super Resolution arXiv 2007 11803v1 cs CV Jo Younghyun Oh Seoung Wug Kang Jaeyeon Kim Seon Joo 2018 Deep Video Super Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation 2018 IEEE CVF Conference on Computer Vision and Pattern Recognition IEEE pp 3224 3232 doi 10 1109 cvpr 2018 00340 ISBN 978 1 5386 6420 9 Li Sheng He Fengxiang Du Bo Zhang Lefei Xu Yonghao Tao Dacheng 2019 04 05 Fast Spatio Temporal Residual Network for Video Super Resolution arXiv 1904 02870v1 cs CV Kim Soo Ye Lim Jeongyeon Na Taeyoung Kim Munchurl 2019 Video Super Resolution Based on 3D CNNS with Consideration of Scene Change 2019 IEEE International Conference on Image Processing ICIP pp 2831 2835 doi 10 1109 ICIP 2019 8803297 ISBN 978 1 5386 6249 6 S2CID 202763112 Luo Jianping Huang Shaofei Yuan Yuan 2020 Video Super Resolution using Multi scale Pyramid 3D Convolutional Networks Proceedings of the 28th ACM International Conference on Multimedia pp 1882 1890 doi 10 1145 3394171 3413587 ISBN 9781450379885 S2CID 222278621 Zhang Dongyang Shao Jie Liang Zhenwen Liu Xueliang Shen Heng Tao 2020 Multi branch Networks for Video Super Resolution with Dynamic Reconstruction Strategy IEEE Transactions on Circuits and Systems for Video Technology 31 10 3954 3966 doi 10 1109 TCSVT 2020 3044451 ISSN 1051 8215 S2CID 235057646 Aksan Emre Hilliges Otmar 2019 02 18 STCN Stochastic Temporal Convolutional Networks arXiv 1902 06568v1 cs LG Huang Yan Wang Wei Wang Liang 2018 Video Super Resolution via Bidirectional Recurrent Convolutional Networks IEEE Transactions on Pattern Analysis and Machine Intelligence 40 4 1015 1028 doi 10 1109 TPAMI 2017 2701380 ISSN 0162 8828 PMID 28489532 S2CID 136582 Zhu Xiaobin Li Zhuangzi Zhang Xiao Yu Li Changsheng Liu Yaqi Xue Ziyu 2019 Residual Invertible Spatio Temporal Network for Video Super Resolution Proceedings of the AAAI Conference on Artificial Intelligence 33 5981 5988 doi 10 1609 aaai v33i01 33015981 ISSN 2374 3468 Li Dingyi Liu Yu Wang Zengfu 2019 Video Super Resolution Using Non Simultaneous Fully Recurrent Convolutional Network IEEE Transactions on Image Processing 28 3 1342 1355 Bibcode 2019ITIP 28 1342L doi 10 1109 TIP 2018 2877334 ISSN 1057 7149 PMID 30346282 S2CID 53044490 Isobe Takashi Zhu Fang Jia Xu Wang Shengjin 2020 08 13 Revisiting Temporal Modeling for Video Super resolution arXiv 2008 05765v2 eess IV Han Lei Fan Cien Yang Ye Zou Lian 2020 Bidirectional Temporal Recurrent Propagation Networks for Video Super Resolution Electronics 9 12 2085 doi 10 3390 electronics9122085 ISSN 2079 9292 Fuoli Dario Gu Shuhang Timofte Radu 2019 09 17 Efficient Video Super Resolution through Recurrent Latent Space Propagation arXiv 1909 08080 eess IV Isobe Takashi Jia Xu Gu Shuhang Li Songjiang Wang Shengjin Tian Qi 2020 08 02 Video Super Resolution with Recurrent Structure Detail Network arXiv 2008 00455v1 cs CV Zhou Chao Chen Can Ding Fei Zhang Dengyin 2021 Video super resolution with non local alignment network IET Image Processing 15 8 1655 1667 doi 10 1049 ipr2 12134 ISSN 1751 9659 Yi Peng Wang Zhongyuan Jiang Kui Jiang Junjun Lu Tao Ma Jiayi 2020 A Progressive Fusion Generative Adversarial Network for Realistic and Consistent Video Super Resolution IEEE Transactions on Pattern Analysis and Machine Intelligence PP 5 2264 2280 doi 10 1109 TPAMI 2020 3042298 ISSN 0162 8828 PMID 33270559 S2CID 227282569 MSU VSR Benchmark Methodology Video Processing 2021 04 26 Retrieved 2021 05 12 Zvezdakova A V Kulikov D L Zvezdakov S V Vatolin D S 2020 BSQ rate a new approach for video codec performance comparison and drawbacks of current solutions Programming and Computer Software 46 3 183 194 doi 10 1134 S0361768820030111 S2CID 219157416 See Better and Further with Super Res Zoom on the Pixel 3 Google AI Blog 2018 10 15 Retrieved from https en wikipedia org w index php title Video super resolution amp oldid 1167180242, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.