Desktop search

OSL Desktop Search engines software Aduna AutoFocus 5

Desktop search tools search within a user's own computer files as opposed to searching the Internet. These tools are designed to find information on the user's PC, including web browser history, e-mail archives, text documents, sound files, images, and video.

One of the main advantages of desktop search programs is that search results are displayed quickly due to the use of proper indexes.

A variety of desktop search programs are now available; see this list for examples. Most desktop search programs are standalone applications, whereas a few also provide search capabilities in an integrated writing environment (IWE).

Desktop search emerged as a concern for large firms for two main reasons: untapped productivity and security. On the one hand, users need to be able to quickly find relevant files, but on the other hand, they shouldn't have access to restricted files. According to analyst firm Gartner, up to 80% of some companies' data is locked up inside unstructured data — the information stored on an end user's PC, the directories (folders) and files they've created on a network, documents stored in repositories such as corporate intranets and a multitude of other locations.[1] Moreover, many companies have structured or unstructured information stored in older file formats to which they don't have ready access.

Companies doing business in the United States are frequently required under regulatory mandates like Sarbanes-Oxley, HIPAA and FERPA to make sure that access to sensitive information is 100% controlled. This creates a challenge for IT organizations, which may not have a desktop search standard, or lack strict central control over end users downloading tools from the Internet. Some consumer-oriented desktop search tools make it possible to generate indexes outside the corporate firewall and share those indexes with unauthorized users. In some cases, end users are able to index — but not preview — items they should not even know exist.

Historically, full desktop search comes from the work of Apple Computer's Advanced Technology Group, resulting in the underlying AppleSearch technology in the early 1990s. It was used to build the Sherlock search engine and then developed into Spotlight, which brought automated, non-timer-based full indexing into the operating system.

Technologies

Most desktop search engines build and maintain an index database to achieve reasonable performance when searching several gigabytes of data. Indexing usually takes place when the computer is idle and most search applications can be set to suspend indexing if a portable computer is running on batteries, in order to save power. There are notable exceptions, however: Voidtools' Everything Search Engine,[2] which performs searches over only filenames not the files' contents for NTFS volumes only, is able to build its index from scratch in just a few seconds. Another exception is Vegnos Desktop Search Engine,[3] which performs searches over filenames and files' contents without building any indices. The benefits to not having indices is that, in addition to not requiring persistent storage, more powerful queries (e.g., regular expressions) can be issued, whereas indexed search engines are limited to keyword-based queries. An index may also not be up-to-date, when a query is performed. In this case, results returned will not be accurate (that is, a hit may be shown when it is no longer there, and a file may not be shown, when in fact it is a hit). Some products have sought to remedy this disadvantage by building a real-time indexing function into the software. There are disadvantages to not indexing. Namely, the time to complete a query can be significant, and the issued query can also be resource-intensive.

Desktop search tools typically collect three types of information about files:

To search effectively within documents, the tools need to be able to parse many different types of documents. This is achieved by using filters that interpret selected file formats. For example, a Microsoft Office Filter might be used to search inside Microsoft Office documents.

Long-term goals for desktop search include the ability to search the contents of image files, sound files and video by context.[4][5]

The sector attracted considerable attention from the struggle between Microsoft and Google.[6] According to market analysts, both companies were attempting to leverage their monopolies (of web browsers and search engines, respectively) to strengthen their dominance. Due to Google's complaint that users of Windows Vista cannot choose any competitor's desktop search program over the built-in one, an agreement was reached between US Justice Department and Microsoft that Windows Vista Service Pack 1 would enable users to choose between the built-in and other desktop search programs, and select which one is to be the default.[7]

As of September, 2011, Google ended life for Google Desktop, a program designed to make it easy for users to search their own PCs for emails, files, music, photos, Web pages and more.[8]

Desktop search products are software alternatives to Windows Desktop and Outlook Search, helping business professional sift through desktop files, emails, attachments, SharePoint data, and more.,[9][10][11]

Platforms & their histories

There are three main platforms that desktop search falls into. Windows, Mac OS & Linux. This article will focus on the history of these search platforms, the features they had, and how those features evolved.

Windows

Today's Windows Search replaced WDS (Windows Desktop Search). WDS, in turn, replaced Indexing Service. A "a base service that extracts content from files and constructs an indexed catalog to facilitate efficient and rapid searching"[12] Indexing service was originally released in August 1996, it was built in order to speed up manually searching for files on Personal Desktops and Corporate Computer Network. Indexing service helped by using Microsoft web servers to index files on the desired hard drives. Indexing was done by file format. By using terms that users provided, a search was conducted that matched terms to the data within the file formats. The largest issue that Indexing service faced was the fact that every time a file was added, it had to be indexed. This coupled with the fact that the indexing cached the entire index in RAM, made the hardware a huge limitation.[13] This made indexing large amounts of files require extremely powerful hardware and very long wait times.

In 2003, Windows Desktop Search (WDS) replaced Microsoft Indexing Service. Instead of only matching terms to the details of the file format and file names, WDS brings in content indexing to all Microsoft files and text-based formats such as e-mail and text files. This means, that WDS looked into the files and indexed the content. Thus, when a user searched a term, WDS no longer matched just information such as file format types and file names, but terms, and values stored within those files. WDS also brought "Instant searching" meaning the user could type a character and the query would instantly start searching and updating the query as the user typed in more characters.[14] Windows Search apparently used up a lot of processing power, as Windows Desktop Search would only run if it was directly queried or while the PC was idle. Even only running while directly queried or while the computer was idled, indexing the entire hard drive still took hours. The index would be around 10% of the size of all the files that it indexed. For example, if the indexed files amounted to around 100GB of space, the index would, itself, be 10GB large.

With the release of Windows Vista came Windows Search 3.1. Unlike its predecessors WDS and Windows Search 3.0, 3.1 could search through both indexed and non indexed locations seamlessly. Also, the RAM and CPU requirements were greatly reduced, cutting back indexing times immensely. Windows Search 4.0 is currently running on all PCs with Windows 7 and up.

Mac OS

Mac OS was the first to implement Desktop Search with its AppleSearch search engine, allowing users to fully search all documents within their Macintosh computer, including file format types, meta-data on those files, and content within the files. AppleSearch was a client/server application, and as such required a server separate from the main device in order to function. The biggest issue with AppleSearch were its large resource requirements: "AppleSearch requires at least a 68040 processor and 5MB of RAM."[15] At the time, a Macintosh computer with these specifications was priced at approximately $1400; equivalent to $2050 in 2015.[16] On top of this, the software itself cost an additional $1400 for a single license.

In 1997, Sherlock was released alongside Mac OS 8.5. Sherlock (named after the famous fictional detective Sherlock Holmes) was integrated into Mac OS's file browser Finder. Sherlock extended the desktop search function to the World Wide Web, allowing users to search both locally and externally. Adding additional functions—such as internet access—to Sherlock was relatively simple, as this was done through plugins written as plain text files. Sherlock was included in every release of Mac OS from Mac OS 8, before being deprecated and replaced by Spotlight and Dashboard in Mac OS X 10.4 Tiger. It was officially removed in Mac OS X 10.5 Leopard

Spotlight was released in 2005 as part of Mac OS X 10.4 Tiger. It is a Selection-based search tool, which means the user invokes a query using only the mouse. Spotlight allows the user to search the Internet for more information about any keyword or phrase contained within a document or webpage, and uses a built-in calculator and Oxford American Dictionary to offer quick access to small calculations and word definitions.[17] While Spotlight initially has a long startup time, this decreases as the hard disk is indexed. As files are added by the user, the index is constantly updated in the background using minimal CPU & RAM resources.

Linux

There are a wide range of desktop search options for Linux users, depending upon the skill level of the user, their preference to use desktop tools which tightly integrate into their desktop environment, command-shell functionality (often with advanced scripting options), or browser-based users interfaces to locally running software. In addition, many users create their own indexing from a variety of indexing packages (e.g. one which does extraction and indexing of PDF/DOC/DOCX/ODT documents well, another search engine which works w/ vcard, LDAP, and other directory/contact databases, as well as the conventional find and locate commands.

Ubuntu

The Ubuntu distribution is a popular version of Linux. Strangely enough, Ubuntu didn't have desktop search until Feisty Fawn 7.04. Using Tracker[18] desktop search, the desktop search feature was very similar to Mac OS's AppleSearch and Sherlock. Considering the fact that both are UNIX-based systems. Tracker, released in late 2007, was built to have a relatively low impact on system resources. But unfortunately occasionally had sporadic control over what resources it was using. It not only featured the basic features of file format sorting and meta-data matching, but support for searching through emails and instant messages was added. Years later, in 2014 Recoll[19] was added to Linux distributions, it works with other search programs such as Tracker and Beagle to provide efficient full text search. This greatly increased the types of queries that Linux desktop searches could handle as well as file types. A major advantage of Recoll is that it allows for greater customization of what is indexed. For example, Recoll will index the entire hard disk by default, but will and can index just a few select directories instead of wasting time indexing directories you know you will never need to look at. It also allows for more search options, you may actually narrow down what kind of query you want to ask. For example, you could search for just file types or by content.[20]

openSUSE[21]

Starting with KDE4, the NEPOMUK was introduced. It provided the ability to index a wide range of desktop content, email, and use semantic web technologies (e.g. RDF) to annotate the database. The introduction faced a few glitches, much of which seemed to be based on the triplestore. Performance improved (at least for queries) by switching the backend to a stripped own version of the Virtuoso Open Source Edition, however indexing remained a common user complaint. Based on user feedback, the Nepomuk indexing and search has been replaced with the Baloo framework[22] based on Xapian.

See also

References

  1. "Security special report: Who sees your data?", Computer Weekly, 2006-04-25.
  2. "Everything Search Engine". voidtools. Retrieved 27 December 2013.
  3. "Vegnos". Vegnos. Retrieved 27 December 2013.
  4. Niall Kennedy (17 October 2006). "The current state of video search". Niall Kennedy. Retrieved 24 June 2015.
  5. Niall Kennedy (15 October 2006). "The current state of audio search". Niall Kennedy. Retrieved 24 June 2015.
  6. "BBC NEWS - Technology - Search wars hit desktop computers". bbc.co.uk. Retrieved 24 June 2015.
  7. "SearchMax". goebelgroup.com. Retrieved 24 June 2015.
  8. "Google Desktop Update" (Sept 2011)
  9. „What do you do for desktop search in VDI and RDSH?“. Blogpost by Brian Madden on brainmadden.com. Retrieved on March 25, 2015.
  10. Anthony Ha (2 June 2008). "Lookeen offers a new way for Outlook users to search". VentureBeat. Retrieved 8 March 2016.
  11. Robert L. Mitchell (8 May 2013). "X1 rises again with Desktop Search 8, Virtual Edition". Computerworld. Retrieved 24 June 2015.
  12. "Indexing Service". microsoft.com. Microsoft. Retrieved 24 June 2015.
  13. "Indexing with Microsoft Index Server". microsoft.com. Microsoft. Retrieved 24 June 2015.
  14. "Windows Search: Technical FAQ". microsoft.com. Microsoft. Archived from the original on 24 September 2011. Retrieved 24 June 2015.
  15. "AppleSearch". infomotions.com. Retrieved 24 June 2015.
  16. eduardo casais. "Converter of current to real US dollars - using the GDP deflator". areppim.com. Retrieved 24 June 2015.
  17. "Apple - Press Info - Apple to Ship Mac OS X "Tiger" on April 29". apple.com. Retrieved 24 June 2015.
  18. "A first look at Tracker 0.6.0". Ars Technica. Retrieved 24 June 2015.
  19. "Recoll user manual". lesbonscomptes.com. Retrieved 24 June 2015.
  20. "Linux.com". linux.com. Retrieved 24 June 2015.
  21. http://www.opensuse.org/
  22. https://community.kde.org/Baloo
This article is issued from Wikipedia - version of the 11/10/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.