fbpx
Wikipedia

Video search engine

A video search engine is a web-based search engine which crawls the web for video content. Some video search engines parse externally hosted content while others allow content to be uploaded and hosted on their own servers. Some engines also allow users to search by video format type and by length of the clip. The video search results are usually accompanied by a thumbnail view of the video.

Video search engines are computer programs designed to find videos stored on digital devices, either through Internet servers or in storage units from the same computer. These searches can be made through audiovisual indexing, which can extract information from audiovisual material and record it as metadata, which will be tracked by search engines.

Utility edit

The main use of these search engines is the increasing creation of audiovisual content and the need to manage it properly. The digitization of audiovisual archives and the establishment of the Internet, has led to large quantities of video files stored in big databases, whose recovery can be very difficult because of the huge volumes of data and the existence of a semantic gap.

Search criterion edit

The search criterion used by each search engine depends on its nature and purpose of the searches.

Metadata edit

Metadata is information about facts. It could be information about who is the author of the video, creation date, duration, and all the information that could be extracted and included in the same files. Internet is often used in a language called XML to encode metadata, which works very well through the web and is readable by people. Thus, through this information contained in these files is the easiest way to find data of interest to us.

In the videos there are two types of metadata, that we can integrate in the video code itself and external metadata from the page where the video is. In both cases we optimize them to make them ideal when indexed.

Internal metadata edit

All video formats incorporate their own metadata. The title, description, coding quality or transcription of the content are possible. To review these data exist programs like FLV MetaData Injector, Sorenson Squeeze or Castfire. Each one has some utilities and special specifications.

Converting from one format to another can lose much of this data, so check that the new format information is correct. It is therefore advisable to have the video in multiple formats, so all search robots will be able to find and index it.

External metadata edit

In most cases the same mechanisms must be applied as in the positioning of an image or text content.

Title and description edit

They are the most important factors when positioning a video, because they contain most of the necessary information. The titles have to be clearly descriptive and should remove every word or phrase that is not useful.

Filename edit

It should be descriptive, including keywords that describe the video with no need to see their title or description. Ideally, separate the words by dashes "-".

Tags edit

On the page where the video is, it should be a list of keywords linked to the microformat "rel-tag". These words will be used by search engines as a basis for organizing information.

Transcription and subtitles edit

Although not completely standard, there are two formats that store information in a temporal component that is specified, one for subtitles and another for transcripts, which can also be used for subtitles. The formats are SRT or SUB for subtitles and TTXT for transcripts.

Speech recognition edit

Speech recognition consists of a transcript of the speech of the audio track of the videos, creating a text file. In this way and with the help of a phrase extractor can easily search if the video content is of interest. Some search engines apart from using speech recognition to search for videos, also use it to find the specific point of a multimedia file in which a specific word or phrase is located and so go directly to this point. Gaudi (Google Audio Indexing), a project developed by Google Labs, uses voice recognition technology to locate the exact moment that one or more words have been spoken within an audio, allowing the user to go directly to exact moment that the words were spoken. If the search query matches some videos from YouTube, the positions are indicated by yellow markers, and must pass the mouse over to read the transcribed text.

Speaker recognition edit

In addition to transcription, analysis can detect different speakers and sometime attribute the speech to an identified name for the speaker.

Text recognition edit

The text recognition can be very useful to recognize characters in the videos through "chyrons". As with speech recognizers, there are search engines that allow (through character recognition) to play a video from a particular point.

TalkMiner, an example of search of specific fragments from videos by text recognition, analyzes each video once per second looking for identifier signs of a slide, such as its shape and static nature, captures the image of the slide and uses Optical Character Recognition (OCR) to detect the words on the slides. Then, these words are indexed in the search engine of TalkMiner, which currently offers to users more than 20,000 videos from institutions such as Stanford University, the University of California at Berkeley, and TED.

Frame analysis edit

Through the visual descriptors we can analyze the frames of a video and extract information that can be scored as metadata. Descriptions are generated automatically and can describe different aspects of the frames, such as color, texture, shape, motion, and the situation.

Chaptering edit

The video analysis can lead to automatic chaptering, using technics such as change of camera angle, identification of audio jingles. By knowing the typical structure of a video document, it is possible to identify starting and ending credits, content parts and beginning and ending of advertising breaks.

Ranking criterion edit

The usefulness of a search engine depends on the relevance of the result set returned. While there may be millions of videos that include a particular word or phrase, some videos may be more relevant, popular or have more authority than others. This arrangement has a lot to do with search engine optimization.

Most search engines use different methods to classify the results and provide the best video in the first results. However, most programs allow sorting the results by several criteria.

Order by relevance edit

This criterion is more ambiguous and less objective, but sometimes it is the closest to what we want; depends entirely on the searcher and the algorithm that the owner has chosen. That's why it has always been discussed and now that search results are so ingrained into our society it has been discussed even more. This type of management often depends on the number of times that the searched word comes out, the number of viewings of this, the number of pages that link to this content and ratings given by users who have seen it.[1]

Order by date of upload edit

This is a criterion based totally on timeline. Results can be sorted according to their seniority in the repository.

Order by number of views edit

It can give us an idea of the popularity of each video.

Order by length edit

This is the length of the video and can give a taste of which video it is.

Order by user rating edit

It is common practice in repositories let the users rate the videos, so that a content of quality and relevance will have a high rank on the list of results gaining visibility. This practice is closely related to virtual communities.

Interfaces edit

We can distinguish two basic types of interfaces, some are web pages hosted on servers which are accessed by Internet and searched through the network, and the others are computer programs that search within a private network.

Internet edit

Within Internet interfaces we can find repositories that host video files which incorporate a search engine that searches only their own databases, and video searchers without repository that search in sources of external software.

Repositories with video searcher edit

Provides accommodation in video files stored on its servers and usually has an integrated search engine that searches through videos uploaded by its users. One of the first web repositories, or at least the most famous are the portals Vimeo, Dailymotion and YouTube.

Their searches are often based on reading the metadata tags, titles and descriptions that users assign to their videos. The disposal and order criterion of the results of these searches are usually selectable between the file upload date, the number of viewings or what they call the relevance. Still, sorting criteria are nowadays the main weapon of these websites, because the positioning of videos is important in terms of promotion.[citation needed]

Video searchers repositories edit

They are websites specialized in searching videos across the network or certain pre-selected repositories. They work by web spiders that inspect the network in an automated way to create copies of the visited websites, which will then be indexed by search engines, so they can provide faster searches.

Private network edit

 
Functioning scheme

Sometimes a search engine only searches in audiovisual files stored within a computer or, as it happens in televisions, on a private server where users access through a local area network. These searchers are usually software or rich Internet applications with a very specific search options for maximum speed and efficiency when presenting the results. They are typically used for large databases and are therefore highly focused to satisfy the needs of television companies. An example of this type of software would be the Digition Suite, which apart from being a benchmark in this kind of interfaces is very close to us as for the storage and retrieval files system from the Corporació Catalana de Mitjans Audiovisuals.[2]

This particular suite and perhaps in its strongest point is that it integrates the entire process of creating, indexing, storing, searching, editing, and a recovery. Once we have a digitized audiovisual content is indexed with different techniques of different level depending on the importance of content and it's stored. The user, when he wants to retrieve a particular file, has to fill a search fields such as program title, issue date, characters who act or the name of the producer, and the robot starts the search. Once the results appear and they arranged according to preferences, the user can play the low quality videos to work as quickly as possible. When he finds the desired content, it is downloaded with good definition, it's edited and reproduced.[3]

Design and algorithms edit

Video search has evolved slowly through several basic search formats which exist today and all use keywords. The keywords for each search can be found in the title of the media, any text attached to the media and content linked web pages, also defined by authors and users of video hosted resources.

Some video search is performed using human powered search, others create technological systems that work automatically to detect what is in the video and match the searchers needs. Many efforts to improve video search including both human powered search as well as writing algorithm that recognize what's inside the video have meant complete redevelopment of search efforts.

It is generally acknowledged that speech to text is possible, though recently Thomas Wilde, the new CEO of Everyzing, acknowledged that Everyzing works 70% of the time when there is music, ambient noise or more than one person speaking. If newscast style speaking (one person, speaking clearly, no ambient noise) is available, that can rise to 93%. (From the Web Video Summit, San Jose, CA, June 27, 2007).

Around 40 phonemes exist in every language with about 400 in all spoken languages. Rather than applying a text search algorithm after speech-to-text processing is completed, some engines use a phonetic search algorithm to find results within the spoken word. Others work by literally listening to the entire podcast and creating a text transcription using a sophisticated speech-to-text process. Once the text file is created, the file can be searched for any number of search words and phrases.

It is generally acknowledged that visual search into video does not work well and that no company is using it publicly. Researchers at UC San Diego and Carnegie Mellon University have been working on the visual search problem for more than 15 years, and admitted at a "Future of Search" conference at UC Berkeley in spring 2007 that it was years away from being viable even in simple search.

Video search engines edit

Agnostic search edit

Search that is not affected by the hosting of video, where results are agnostic no matter where the video is located:

  • blinkx was launched in 2004 and uses speech recognition and visual analysis to process spidered video rather than rely on metadata alone. blinkx claims to have the largest archive of video on the web and puts its collection at around 26,000,000 hours of content.
  • CastTV is a Web-wide video search engine that was founded in 2006 and funded by Draper Fisher Jurvetson, Ron Conway, and Marc Andreessen.
  • Munax released their first version all-content search engine in 2005 and powers both nationwide and worldwide search engines with video search.
  • Picsearch Video Search has been licensed to search portals since 2006. Picsearch is a search technology provider who powers image, video and audio search for over 100 major search engines around the world.

Non-agnostic search edit

Search results are modified, or suspect, due to the large hosted video being given preferential treatment in search results:

  • AOL Video offers a video search engine that can be used to find video located on popular video destinations across the web. In December 2005, AOL acquired Truveo Video Search.
  • Bing video search is a search engine powered by Bing and also used by Yahoo! Video Search.
  • Google Videos is a video search engine from Google.
  • Tencent Video offers video search from Tencent.

See also edit

References edit

  1. ^ (in English) SEO by Google central webmaster
  2. ^ (in Catalan) Digitalize or die (Alícia Conesa) July 8, 2011, at the Wayback Machine
  3. ^ (in Catalan)

External links edit

Process of search engines How Stuff Works (in English) 

video, search, engine, video, search, engine, based, search, engine, which, crawls, video, content, some, video, search, engines, parse, externally, hosted, content, while, others, allow, content, uploaded, hosted, their, servers, some, engines, also, allow, u. A video search engine is a web based search engine which crawls the web for video content Some video search engines parse externally hosted content while others allow content to be uploaded and hosted on their own servers Some engines also allow users to search by video format type and by length of the clip The video search results are usually accompanied by a thumbnail view of the video Video search engines are computer programs designed to find videos stored on digital devices either through Internet servers or in storage units from the same computer These searches can be made through audiovisual indexing which can extract information from audiovisual material and record it as metadata which will be tracked by search engines Contents 1 Utility 2 Search criterion 2 1 Metadata 2 1 1 Internal metadata 2 1 2 External metadata 2 1 2 1 Title and description 2 1 2 2 Filename 2 1 2 3 Tags 2 1 2 4 Transcription and subtitles 2 2 Speech recognition 2 2 1 Speaker recognition 2 3 Text recognition 2 4 Frame analysis 2 4 1 Chaptering 3 Ranking criterion 3 1 Order by relevance 3 2 Order by date of upload 3 3 Order by number of views 3 4 Order by length 3 5 Order by user rating 4 Interfaces 4 1 Internet 4 1 1 Repositories with video searcher 4 1 2 Video searchers repositories 4 2 Private network 5 Design and algorithms 6 Video search engines 6 1 Agnostic search 6 2 Non agnostic search 7 See also 8 References 9 External linksUtility editThe main use of these search engines is the increasing creation of audiovisual content and the need to manage it properly The digitization of audiovisual archives and the establishment of the Internet has led to large quantities of video files stored in big databases whose recovery can be very difficult because of the huge volumes of data and the existence of a semantic gap Search criterion editThe search criterion used by each search engine depends on its nature and purpose of the searches Metadata edit Metadata is information about facts It could be information about who is the author of the video creation date duration and all the information that could be extracted and included in the same files Internet is often used in a language called XML to encode metadata which works very well through the web and is readable by people Thus through this information contained in these files is the easiest way to find data of interest to us In the videos there are two types of metadata that we can integrate in the video code itself and external metadata from the page where the video is In both cases we optimize them to make them ideal when indexed Internal metadata edit All video formats incorporate their own metadata The title description coding quality or transcription of the content are possible To review these data exist programs like FLV MetaData Injector Sorenson Squeeze or Castfire Each one has some utilities and special specifications Converting from one format to another can lose much of this data so check that the new format information is correct It is therefore advisable to have the video in multiple formats so all search robots will be able to find and index it External metadata edit In most cases the same mechanisms must be applied as in the positioning of an image or text content Title and description edit They are the most important factors when positioning a video because they contain most of the necessary information The titles have to be clearly descriptive and should remove every word or phrase that is not useful Filename edit It should be descriptive including keywords that describe the video with no need to see their title or description Ideally separate the words by dashes Tags edit On the page where the video is it should be a list of keywords linked to the microformat rel tag These words will be used by search engines as a basis for organizing information Transcription and subtitles edit Although not completely standard there are two formats that store information in a temporal component that is specified one for subtitles and another for transcripts which can also be used for subtitles The formats are SRT or SUB for subtitles and TTXT for transcripts Speech recognition edit Speech recognition consists of a transcript of the speech of the audio track of the videos creating a text file In this way and with the help of a phrase extractor can easily search if the video content is of interest Some search engines apart from using speech recognition to search for videos also use it to find the specific point of a multimedia file in which a specific word or phrase is located and so go directly to this point Gaudi Google Audio Indexing a project developed by Google Labs uses voice recognition technology to locate the exact moment that one or more words have been spoken within an audio allowing the user to go directly to exact moment that the words were spoken If the search query matches some videos from YouTube the positions are indicated by yellow markers and must pass the mouse over to read the transcribed text Speaker recognition edit In addition to transcription analysis can detect different speakers and sometime attribute the speech to an identified name for the speaker Text recognition edit The text recognition can be very useful to recognize characters in the videos through chyrons As with speech recognizers there are search engines that allow through character recognition to play a video from a particular point TalkMiner an example of search of specific fragments from videos by text recognition analyzes each video once per second looking for identifier signs of a slide such as its shape and static nature captures the image of the slide and uses Optical Character Recognition OCR to detect the words on the slides Then these words are indexed in the search engine of TalkMiner which currently offers to users more than 20 000 videos from institutions such as Stanford University the University of California at Berkeley and TED Frame analysis edit Through the visual descriptors we can analyze the frames of a video and extract information that can be scored as metadata Descriptions are generated automatically and can describe different aspects of the frames such as color texture shape motion and the situation Chaptering edit The video analysis can lead to automatic chaptering using technics such as change of camera angle identification of audio jingles By knowing the typical structure of a video document it is possible to identify starting and ending credits content parts and beginning and ending of advertising breaks Ranking criterion editThe usefulness of a search engine depends on the relevance of the result set returned While there may be millions of videos that include a particular word or phrase some videos may be more relevant popular or have more authority than others This arrangement has a lot to do with search engine optimization Most search engines use different methods to classify the results and provide the best video in the first results However most programs allow sorting the results by several criteria Order by relevance edit This criterion is more ambiguous and less objective but sometimes it is the closest to what we want depends entirely on the searcher and the algorithm that the owner has chosen That s why it has always been discussed and now that search results are so ingrained into our society it has been discussed even more This type of management often depends on the number of times that the searched word comes out the number of viewings of this the number of pages that link to this content and ratings given by users who have seen it 1 Order by date of upload edit This is a criterion based totally on timeline Results can be sorted according to their seniority in the repository Order by number of views edit It can give us an idea of the popularity of each video Order by length edit This is the length of the video and can give a taste of which video it is Order by user rating edit It is common practice in repositories let the users rate the videos so that a content of quality and relevance will have a high rank on the list of results gaining visibility This practice is closely related to virtual communities Interfaces editWe can distinguish two basic types of interfaces some are web pages hosted on servers which are accessed by Internet and searched through the network and the others are computer programs that search within a private network Internet edit Within Internet interfaces we can find repositories that host video files which incorporate a search engine that searches only their own databases and video searchers without repository that search in sources of external software Repositories with video searcher edit Provides accommodation in video files stored on its servers and usually has an integrated search engine that searches through videos uploaded by its users One of the first web repositories or at least the most famous are the portals Vimeo Dailymotion and YouTube Their searches are often based on reading the metadata tags titles and descriptions that users assign to their videos The disposal and order criterion of the results of these searches are usually selectable between the file upload date the number of viewings or what they call the relevance Still sorting criteria are nowadays the main weapon of these websites because the positioning of videos is important in terms of promotion citation needed Video searchers repositories edit They are websites specialized in searching videos across the network or certain pre selected repositories They work by web spiders that inspect the network in an automated way to create copies of the visited websites which will then be indexed by search engines so they can provide faster searches Private network edit nbsp Functioning schemeSometimes a search engine only searches in audiovisual files stored within a computer or as it happens in televisions on a private server where users access through a local area network These searchers are usually software or rich Internet applications with a very specific search options for maximum speed and efficiency when presenting the results They are typically used for large databases and are therefore highly focused to satisfy the needs of television companies An example of this type of software would be the Digition Suite which apart from being a benchmark in this kind of interfaces is very close to us as for the storage and retrieval files system from the Corporacio Catalana de Mitjans Audiovisuals 2 This particular suite and perhaps in its strongest point is that it integrates the entire process of creating indexing storing searching editing and a recovery Once we have a digitized audiovisual content is indexed with different techniques of different level depending on the importance of content and it s stored The user when he wants to retrieve a particular file has to fill a search fields such as program title issue date characters who act or the name of the producer and the robot starts the search Once the results appear and they arranged according to preferences the user can play the low quality videos to work as quickly as possible When he finds the desired content it is downloaded with good definition it s edited and reproduced 3 Design and algorithms editVideo search has evolved slowly through several basic search formats which exist today and all use keywords The keywords for each search can be found in the title of the media any text attached to the media and content linked web pages also defined by authors and users of video hosted resources Some video search is performed using human powered search others create technological systems that work automatically to detect what is in the video and match the searchers needs Many efforts to improve video search including both human powered search as well as writing algorithm that recognize what s inside the video have meant complete redevelopment of search efforts It is generally acknowledged that speech to text is possible though recently Thomas Wilde the new CEO of Everyzing acknowledged that Everyzing works 70 of the time when there is music ambient noise or more than one person speaking If newscast style speaking one person speaking clearly no ambient noise is available that can rise to 93 From the Web Video Summit San Jose CA June 27 2007 Around 40 phonemes exist in every language with about 400 in all spoken languages Rather than applying a text search algorithm after speech to text processing is completed some engines use a phonetic search algorithm to find results within the spoken word Others work by literally listening to the entire podcast and creating a text transcription using a sophisticated speech to text process Once the text file is created the file can be searched for any number of search words and phrases It is generally acknowledged that visual search into video does not work well and that no company is using it publicly Researchers at UC San Diego and Carnegie Mellon University have been working on the visual search problem for more than 15 years and admitted at a Future of Search conference at UC Berkeley in spring 2007 that it was years away from being viable even in simple search Video search engines editAgnostic search edit Search that is not affected by the hosting of video where results are agnostic no matter where the video is located blinkx was launched in 2004 and uses speech recognition and visual analysis to process spidered video rather than rely on metadata alone blinkx claims to have the largest archive of video on the web and puts its collection at around 26 000 000 hours of content CastTV is a Web wide video search engine that was founded in 2006 and funded by Draper Fisher Jurvetson Ron Conway and Marc Andreessen Munax released their first version all content search engine in 2005 and powers both nationwide and worldwide search engines with video search Picsearch Video Search has been licensed to search portals since 2006 Picsearch is a search technology provider who powers image video and audio search for over 100 major search engines around the world Non agnostic search edit Search results are modified or suspect due to the large hosted video being given preferential treatment in search results AOL Video offers a video search engine that can be used to find video located on popular video destinations across the web In December 2005 AOL acquired Truveo Video Search Bing video search is a search engine powered by Bing and also used by Yahoo Video Search Google Videos is a video search engine from Google Tencent Video offers video search from Tencent See also editContent based image retrieval Metadata Optical character recognition Search engine optimization Speech recognition Video browsing Video content analysisReferences edit in English SEO by Google central webmaster in Catalan Digitalize or die Alicia Conesa Archived July 8 2011 at the Wayback Machine in Catalan Digition Suite from Activa MultimediaExternal links editProcess of search engines How Stuff Works in English Retrieved from https en wikipedia org w index php title Video search engine amp oldid 1190225061, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.