Content-Based Video tutorial Retrieval Method

An Methodology for Inspecting keyframes based on Self Adaptive Threshold and Picture Descriptors

  • Suruthi. K, Tamil Selvan. T, Velu. S, Maheswaran. R, Kumaresan. A


In this paper, we propose a CBVR (content established video retrieval) method for retrieving a desired object from the abstract video tutorial dataset. Recording and storing extensive surveillance training video in a dataset for retrieving the main details of the video is one of the complicated activity in terms of their time and space. Even though, methods are available for retrieving the key content of a video predicated on ROI as well as threshold principles for retrieving track record information key casings, deciding the threshold principles physically is a complex scenario. So, we propose a way, where we use self-adaptive threshold for deciding the background information along with the use of several descriptors to boost the efficiency of determining the contents of the main element frames. We can also use CBVR to retrieve the information of any desired thing from our abstract dataset.

Keywords: Self adaptive threshold, Keyframes, Descriptors, CBVR


The procedure for providing security plays a major role in all organizations nowadays. This security can be provided in lots of ways considering the criticality of the info being guaranteed. Theses security methodologies include providing manual guards about the perimeter or providing electric fence surrounding the infrastructure or any other available effective means of technology available. Regardless of the option of these methodologies, a powerful and 24x7 security could be provided by making use of installation of cameras at the crucial areas of a business that ought to be out of grab the humans. The perfect number of cams to be installed in an environment could be determined with respect to [1]. Since these surveillance cameras are saving videos with a time scale of a day, the recorded videos are to be stored and examined where holding these videos require a massive database and examining these videos require humans to experiment with through the whole video in order to investigate the incidents occurred where in fact the biggest de-merit is that we cannot skip the videos being played out since we'd skip the important actions when we skip. so, we need a method for extracting the essential events been took place from the extended monitoring videos and keeping these events by itself in another databases which would reduce the storage being utilized for data safe-keeping along with minimization of human work to look over the complete videos. We know that the first step in observing videos is to convert it into specific casings or images because the broadcasting of moving visual images form a video tutorial. This is referred to as image retrieval.

Image retrieval is the procedure of retrieving images from a massive database based on the metadata put into the image which could be said as the annotations. But this annotations involve some demerits. Annotating images manual is a time eating work to be done and when images are annotated ambiguously, an individual could not get the required results no subject the number of times he search the image data source. Several options for automated image annotations have been under research because of the advancement in neuro-scientific semantic web and interpersonal web applications. In spite of the advancements, there exists an effective strategy termed CBIR (content based image retrieval), where feature extraction is basis. These features symbolize text structured features representing keywords as well as annotations whereas visible features correspond to color, surface and encounters along with shapes [2]. Since, features takes on a significant role here, when individual inputs an insight image, the pixel value of these images are weighed against all the images prevailing in the data source and the results given to an individual would contain all the images including a part of the queried image which is a powerful way of staying away from annotations to avoid ambiguity. Since our company is coping with videos here, we need an advanced way from CBIR.

2. Related Work:

Speech recognition is an important conc

3. Fast Clustering Method Predicated on ROI

Since users find accessible online videos easily these days, we are in need of finding an efficient way to store and keep maintaining substantial amount of video tutorial files facilitating easy and quick access for multiple users. To be able to support research in this field, Guang-Hua-Song et al have suggested the fast clustering predicated on the region appealing (ROI). The writers have employed the common histogram algorithm for the purpose of extracting key casings from each shot. A shot could be defined as the depiction of a particular landscape or action. A single shot refers to the action covered by a camera between your start and stop of the saving time which would be normally in the same angle. The extracted key structures are being used for the generation of border maps which add the next step in the training video abstraction scenario. Predicated on the aforementioned methodologies, the creators have determined the main element points. Calculation of threshold values from the individual key casings would be the next phase which is performed for the intended purpose of expanding and determining the area encircling the key details [9]. The authors have suggested the observation of main content in each one of the key frame based on the threshold worth defined and the concept of tips. As the final step with their proposed method, they may have utilized the ROIs of the key frames and have performed the fast clustering method about them. The different methodologies involve before applying the fast clustering method along with the implementation of fast clustering technique is explained in the next sections.

A. Key frame Extraction

The representation of video sequence would maintain the form of your hierarchical structure considering the field, shot and structure adding different levels on the hierarchy [10]. Different studies on video recording sequences requires the researches to cope with the different degrees of the video collection hierarchy with respect to the information necessary for their research. Shot is to be considered first for the purpose of key frame extraction. The shot level is chosen at the hierarch among the other available levels credited to certain reasons. The collection of video structures captured continuously by the camera contributed a go which also would are the moving objects, panning and zooming in terms of the recording camera. We also have a greatest merit with the shot as both adjacent shot doesn't have the same content which would clearly eliminate redundancy. The writers have employed the use of algorithm proposed in [11] for the intended purpose of extracting key structures. The key shape removal process also involves the average histogram method. A shot S = of period n is assumed. The kth structure in the assumed shot is symbolized as. Considering to be the grey level histogram made up of L bins could be produced from framework, whereas the computation of the common histogram H is performed based on the next formula

Where symbolizes the value of the ith framework of shape k. After the removal of key body, ROIs are made by adopting a series of key frame examination this process is accompanied by saliency map technology and advantage map era.

B. Advantage Map Detection

It is an over-all concept that people would give attention to objects that includes a whole form in the video. So there would be corners within these components. We could in need of determining the key tips which would be available inside the objects and so identifying edges would make our tracking process easier. The writers have used the canny advantage detection scenario regarding [12]. This technique is followed by the location of tips and era of ROI.

C. Fast Clustering

In a video tutorial sequence, though each shot would be developing a different content to portray, a few of the injections may look similar to one another in camera angle or facial expression of people involved or in any other means. Sometimes, a go would ne personally segmented into many shots and used at different places in a video tutorial sequence. The strategy of the creators is to make the video sequence small and thus they have clustered the key frames in order to avoid the redundant structures.

Normally, clustering before the complete procedure for extracting the main element frames is performed would be of no use since the new frames could not be taken into account. In order to defeat this traditional procedure, the authors have used fast clustering where clustering process starts once the key frame removal and figuring out ROI are done. Even though this process was sufficient to an level, the authors never have used far better descriptors to draw out more features from the casings for better observation. In addition to this manually preparing the threshold to get the background information would not be so effective.

4. Request of Home Adaptive Threshold and Descriptors

Though the utilization of assigning the threshold physically works in an improved way, arranging the threshold manually is a difficult task. So we are in need of an alternate method for placing the threshold which is the adaptive threshold strategy. We propose the use of adaptive threshold in our video abstraction method for the purpose of gaining more knowledge about the things in the backdrop. Furthermore, we've also made use of several descriptors such as FCTH (Fuzzy Color and Surface Histogram) and SCD (Scalable Color Descriptor). A descriptor is normally used for extracting different types of features from an image predicated on the functionality of a descriptor. Features refers to the different sorts of information that may be extracted from an image which may refer to the color, strength, pixels, etc. the features of FCTH and SCD are discussed the following


In this kind of descriptor, fuzzy is utilized for gathering information about colors which rest between the natural black and genuine white. Here, fuzzy is manufactured used of since the general concept of fuzzy is to cope with all possible cases (incomplete true / partial false ) which sits between the True (1) and False (0) beliefs.

B. SCD (Scalable Color Descriptor)

SCD is utilized here for the purpose of extracting information about the colors that happen to be scalable. This scalable colors represent colors that are lengthened to the near by boundaries and would be available in a new form within that boundary.

C. Algorithm: Distance Vector

We are employing Distance Vector algorithm in this training video abstraction process for the intended purpose of observing the distance travelled by an object in two following frames to be able to look for the motion of the object in a far more likely scenario which involves the next steps
  1. Detecting and determining the restrictions of the moving objects.
  2. Extracting ROI (region appealing) of the thing within the frame.
  3. Searching for the same thing within the next subsequent shape.
  4. Detecting boundaries and located area of the object.
  5. Comparing the location of the object and finding its distance shifted from the previous frame to the current frame.
  6. Repeating the above mentioned steps for all the video casings would permit us to find the moving subject distance covered for each frame.
  7. Updating the length vector matrix.

The overall strategy of the suggested strategy is shown in Body 1.

Figure 1. Block Diagram of the Proposed Methodology

This situation is applied for minimizing the memory space complexity in conditions of storing and retrieving gigantic 24x7 surveillance videos where taking and storing of the complete video would boost the demand of memory space as well as looking through the entire video to verify a crime world will be a more complex circumstance. To be able to overcome this intricacy, our method draw out the key frames from the complete training video and store it in a desired repository where only the specific images would be accessible minimizing the task of the user to look through a full size video. In addition to that, saving images could have a storage demand much reduced than the demand of the videos. Since we are using descriptors, more detailed information could be extracted from the images. Self-adaptive threshold permits the user to obtain additional details above the things available in the background which is an added good thing about this methodology. Any kind of frame can be given as a query in to the system and the user would get the relevant training video containing the respected key frame. In the event the frame is unavailable in virtually any of the dataset, consumer would be shown with one prompt. This process is termed as CBVR. CBVR is comparable to CBIR but differs in a way that user would be given a body (image) because of this in case of CBIR whereas consequence would be the entire video in case of CBVR. However in both the cases, data is compared and retrieved based on the details available in the casings.

5. Experimental results

We have conducted our experiment with videos available in the MATLAB dataset. First step could be the extraction of key frames based on self-adaptive threshold value which is shown in Amount 2.

Figure 2. Home window for Key body Extraction

Key structures are extracted and stored in a destined folder as shown in the Figure 3.

Figure 3. Key casings Stored in the Destined Folder

After the main element frame extraction, the user can input an integral frame of their choice and the material of all available videos in the dataset are likened and the respected video including the wanted key body would be found based on CBVR and retrieved as shown in Number 4a. The user can go through the play button offered by the bottom right to play the entire video including the wanted key frame. In the event the requested structure is ot found, the user would be prompted with an error communication as shown in Physique 4b.

Figure 4a. Video is retrieved predicated on the queried key body using CBVR

Figure 4a. Customer id prompted with an error message because the requested shape is not found

Our test have revealed a compromising effect with an increase of than 80% correctness. As described above, this methodology can decrease the memory space requirements and the time of the user to spend in looking through the entire videos.

6. Conclusion

In this paper, we have suggested a strategy for training video abstraction predicated on several descriptors and self-adaptive threshold. This strategy facilitates user to minimize the memory demands and time needs for looking through the videos. Our methodology also employs CBVR for retrieving a training video predicated on the contents with respect to the user wanted key frame. The only real problem that our methodology faces is enough time taken for contrast if the main element body to be researched comes in the final video available in the dataset. Our future work is to concentrate on limiting the time space for comparability in a big training video dataset.


[1] Tatsuya Hirahara

Figure Captions

  • Fig. 1. Optimal Position fo

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)