Fashion & Style. Beauty and health. House. Him and you

Start in science. The goal of the last stage is the final solution of the search problem

Placement of information resources.

Information resource search tools.

Currently, Internet information resources are growing at a rapid pace. The World Wide Web resembles a library reading room, where gigantic volumes of text, graphics, multimedia, archival and other files are stored. It is impossible to completely bypass this room. Here everything changes hourly, the body of various documents grows every second. Finding the information you need is becoming increasingly difficult. Various printed reference books become outdated even before they are published. The only reliable way to search for information is to use special search engines that constantly monitor changes in information on the Internet.

Resources used on the Internet are most often located on the pages of WWW servers (or Web servers), in file archives (FTP archives) and in the Gopher information and reference system.

WWW (World Wide Web) is a global hypertext system that uses the HTTP (HyperText Transfer protocol) protocol to transport information on the Internet. Hypertext is a way of representing all types of information as a sequence of nodes connected to each other by an associative (rather than sequential) connection and implemented in the form of hyperlinks. Hyperlink is a sequence of characters highlighted in hypertext that responds to a mouse click and sends the user to another fragment of hypertext. Most documents stored on a Web server are created in HTML (HyperText Markup Language).

A Gopher server is a server that contains programs that allow you to find files, programs, or other resources on a user-specified topic. The URL of such a server looks like this (if the server, for example, belongs to Microsoft): gopher://gopher.microsoft.com.

There are two groups of search tools: 1) search engines and 2) search services.

IRS (information retrieval system) – This is a system that provides search and selection of necessary data in a special database with descriptions of information sources ( index ) based on information retrieval language and corresponding search rules.

On the Internet, the following search tools for the WWW can be distinguished: search engines, metasearch engines (search services) and accelerated search programs (search agents).


Fig. 13. WWW search tools

Depending on who creates the databases in which the information needed by the user is searched, there are search engines of the first and second kind. In search engines of the first type, databases are created by people; in search engines of the second type, this process is carried out by a computer.


Search engines of the first kind are usually called catalogs (subject or thematic - subject catalogs). Typically, such directories are created by people in the form of hierarchical trees, at the top level of which are the most general concepts: business, politics, education, sports, culture, etc. The lower level elements of such trees are links to specific Web pages and servers. Typically, searches in subject catalogs are carried out using keywords . In this case, it is carried out not in the content of the Web servers, but in their brief descriptions stored in the directory. A search request is formed either as a list of keywords (“information technology”, “computer linguisics”, etc.) or by specifying the URLs of the documents to be searched. Search results are presented in the form of hypertext containing the names or URLs of the found documents as hypertext links.

You can search for the following information using keywords:

1) some text or part of it;

2) factual data (for example, the mass of the sun or the name of the president of the country);

3) paintings, drawings, films, etc. by their names;

4) technical information (for example, information about the speed of a certain car);

5) biographies of people (writers, artists, etc.).

Examples of thematic directories are Yahoo, Galaxy, WWW Virtual Library, WebCrawler, HotBot, etc. A similar Russian-language system is called “Pathfinder”.

Search engines of the second kind are sometimes called automatic indexes, "spiders" or "worms" spiders, crawlers). They constantly scan the Internet, find new documents on the network, and from each document extract all the hyperlinks it contains, which they use to supplement their databases (URL address databases). To be able to perform these functions, the automatic index includes the following three parts: a robot program that constantly crawls the Internet; a database (multiple URLs) that is collected by the robot, and a user interface for searching the required information in this database. There are a large number of automatic indexes. The most popular are:

Foreign search engines:

- Altavista (http://www.altavista.com);

- Go (Infoseek) (http://www.go.com);

- Google (http://www.google.com);

- Excite (http://www.excite.com);

- HotBot (http://www.hotbot.com);

- Northern Light (http://www.northernlight.com).

Russian search engines:

- Yandex(http://www.yandex.ru);

- Rambler (http://www.rambler.ru);

- Aport (http://www.aport.ru).
Popular Belarusian search engines:

- System ALL.BY (http://all.by);

- System *.BY (http://search.promedia.minsk.by);

- Register of Belarusian WWW resources Zubr (http://www.zubr.com);

- Belarusian Internet catalog Akavita (http://akavita.kryvia.net);

- Belarusian resources catalog (http://www.belresource.com.by),

Most search engines are one of the components of multifunctional Internet Web sites - the so-called portals.

Portal– a multifunctional Internet Web site offering a variety of services: information search, free e-mail, etc.

Recently, systems have begun to appear on the World Wide Web that automatically search in two indexes at once (the catalog index and the search engine index). Such systems allow you to take advantage of both types of search servers and are called catalog machines.

Searching for information using various search tools can be carried out by forming simple and complex queries. A simple query is a word or phrase that is sometimes enclosed in quotation marks. A complex query is formed from words or phrases connected by operators such as AND, OR, NOT, NEAR or mathematical symbols, for example "*", "+", "-", "~". Sometimes special terms are used for the same purposes domain, host, link tide and etc.

Today there are so many services for organizing the team work process that it’s impossible to figure it out in a month. If you test all the popular and suitable tools, this will take a lot of time, which is already in short supply, especially in the context of launching a startup.

It's a brilliantly simple tool designed for task management. It will take very little time to start using it. Our entire team mastered it without any problems.

The best part is that it's free!

For organization we use the scrum methodology:

  • We have weekly sprints;
  • every Saturday summing up and planning the next stage;
  • releases are launched when ready.
A few of our boards:

  • Board “HADI” (Hypothesis, Action, Data, Insights)
  • This is an interesting methodology. At the beginning of the week, we set hypotheses that relate to certain metrics. Over the course of a week, these hypotheses are tested and analyzed. As a result, we conclude: is the hypothesis true or not? To start working on a task, we transfer it to another board (product, promotion, etc.).
  • Product board
  • We divide the board into sheets: tasks for the week, in progress, done in a week, bugs, done in a month, etc.
  • “Promotion” board.
  • Here is a visual plan for promoting the project. By time, channels, goals, etc.
  • And etc.
On Zuckerberg Will Call, we published an article “How to organize work on a SaaS project in Trello.” Be sure to read, we have described in detail our approach to managing tasks and metrics. And how we implemented it using Trello.

Below is the “HADI” board. As you can see, each metric that is affected by a task has its own color. This is done so that when transferring the task to other boards, we retain an understanding of what metric we want to improve.

And this is what the “Product” board looks like. Tasks come here from the “HADI” board, and each of them is highlighted in a certain color. We see what metric this task will affect. Accordingly, testing hypotheses and analyzing the effect of changes becomes much easier.

It’s great that as soon as something changes in our process - we come up with new management features, or realize that something is not “perfect” - we immediately change it in Trello and start using it. The cost of changes is 3.5 seconds.

2. RealtimeBoard - managing changes in the interface
In online services, the interface is one of the main components of the product. Work on design is constantly in full swing. Any change or planning for change must be discussed by the entire team. After all, in a startup, the opinion of every player on the team is worth its weight in gold.

The designer sees the task from the point of view of design (how best to highlight key elements, place accents, etc.), the product owner from the point of view of the client (what is important for the user, which elements are forgotten or what is superfluous), the developer from the point of view of technology (not all the designer’s fantasies can be realized in a short period of time).

If thinking about ideas and tasks is easy, then how to discuss the design itself? As it usually happens: “That thing on the bottom right, you need to make it a little further to the right and make the stroke color greener.” What thing, what stroke, what does greener mean? This didn't suit us. The design needs to be discussed visually – that is, drawing, scribbling, seeing previous iterations.

Decisions must be made quickly, so there is no time for special meetings and discussions.
We use the RealtimeBoard service. We discuss the entire design in it. The service is ideal for this. In one place you can keep versions of all pages, concepts, comments.

Here's an example of how we discussed the process of developing a user card:

Each comment has its own color:

  • yellow - just a discussion, question, explanation;
  • red - a change is needed in this place;
  • green - a resolved issue (usually red turns into green).
Here is an example of describing user life scenarios.

We jointly built the user’s life cycle and determined when to send which letters and messages.

3. Carrot Quest - understand users and communicate with them
It may not be modest to talk about our own service, but it is ideal for us. In it we organize all analytics and communicate with users (provide support, do marketing, return users).

So, the user has registered. Of course, we immediately automatically send a letter thanking you for registration and instructions (how, where to install the code on the site, etc.).

If we know how much time has passed since registration, we can assume what information is currently relevant for the user and how to engage him further.

We divide registered users into segments based on the time of their registration in the service in order to help them at all stages of the trial (test period) and involve them in further work.

Example:

  • 2 days of the trial have passed - we have time to analyze the user’s site and determine how we can help;
  • 3-7 days have passed - we offer a number of instructions and cases that tell in detail about each tool (how to set up and use effectively);
  • 7-12 days have passed - every day we send statistics that we collected using Carrot Quest (an example is shown below);
  • 12 days have passed - we remind you that there are 2 days left until the end of the trial period and in order to continue working, you need to pay. We transfer the client to the page with tariff plans.
In the service, we look at detailed statistics on mailings (how many were sent during the period, % of those who read, % of those who responded, etc.). It is also important that we know which users performed the actions in the letter. So we select those who read the letter and if they did not respond, then we ask: “what went wrong?”

Here's an example of the automated welcome email we send immediately after registration.

Here is an example of a letter with statistics about users of a connected site:

We will write about the process of analyzing user actions, support and activation in the following articles.

4. Slack - communicate as a team

Standard messengers usually distract from work and disorganize the team (skype, vk). We cannot refuse them, but we believe that for work communication we need a special service in which there are no external stimuli (friends, acquaintances, relatives). That's why we use Slack. It is great for communication within a team. It has everything you need.

In Slack, we divide conversations into channels because... The flow of information in the team is very high. If you keep everything in one place, it will not lead to anything good. Here are our channels:

  • General (we discuss all the main points on the project);
  • Design (discussing design);
  • Read-me (we share useful content: articles, videos, presentations);
  • Bugs (actually, we are discussing bugs here);
  • Ideas (we collect and discuss all kinds of ideas, both ours and those of our clients);
  • And several other channels. I will tell you a little more about two of them.

Channel "Task"
We've set up Trello integration with Slack. Now we can see in real time what changes have occurred in Trello. This helps us react quickly and understand what stage of work we are currently at.

Integration of various services with Slack is a very cool thing; it allows you to control processes in one place without sacrificing context and time.

Channel "Notifications"

We have integrated Carrot quest with Slack. Now we receive a notification if the user has performed a series of actions. You can select those actions about which it is important for you to know and receive timely notification.

For example, one of the actions that we ourselves monitor is registration steps. As soon as the user starts registering, we instantly receive a notification about what kind of site he connected to. And if there is time, from the notification in slack we go to the user’s card and through chat we help the user set up the service.

These 4 services help us effectively organize our work and improve our product. We are constantly trying something new and will try to share our experience.

It will be interesting to know how you work with such services and what solutions you use for tasks of this type.

Work efficiently!

IRS (information retrieval system) is a system that provides search and selection of necessary data in a special database with descriptions of information sources (index) based on information retrieval language and corresponding search rules.

Relevance– this is the correspondence of search results to the formulated query.

Pertinence(in information retrieval) - compliance of the information received with the information needs of the user.

Relevance is measured by the degree of correspondence between user expectations and search results (compare with relevance), which is defined as the ratio of the amount of information useful to the user to the total amount of information retrieved found by the search engine.

Achieving a high degree of persistence is the main field of competition for modern search engines. For maximum satisfaction information needs users, the theories and methods of semantic networks, content analysis and in-depth analysis of texts (Text mining, text mining).

To find the necessary information on the Internet, use resource address (English . UniformResourceLocator (URL) address), containing the name of the protocol by which to access the required information, the server address and the name of the file on this server (Fig. 2).

Rice. 2. Example of a resource address

Search system- a software and hardware complex with a web interface that provides the ability to search for information on the Internet. A search engine usually means a website on which the system interface is located. The software part of a search system is a search engine (search engine) - a set of programs that provides the functionality of a search system and is usually a trade secret of the search engine developer company

Searching for information on the Internet is carried out using special programs that process requests - information retrieval systems (IRS). There are several models on which the work of search engines is based, but historically two models have gained the greatest popularity - this search directories and search indexes.

Search catalogs are organized on the same principle as the subject catalogs of large libraries. They are usually hierarchical hypertext menus with items and sub-items that define the topics of sites whose addresses are contained in this directory, with a gradual clarification of the topic from level to level. Search directories are created manually. Highly qualified editors personally review the WWW information space, select what they consider to be of public interest, and enter it into the catalogue.

The main problem of search directories is the extremely low coverage rate of WWW resources. In order to significantly increase the coverage rate of Web resources, the human factor must be eliminated from the process of filling the search engine database - the work must be automated.

Automatic cataloging of Web resources and satisfaction of customer requests is performed search indexes. The work of the search index can be divided into three stages:

    collection of the primary database. To scan the WWW information space, special agent programs are used - worms, whose task is to search for unknown resources and register them in the database;

    database indexing - primary processing for the purpose of search optimization. At the indexing stage, specialized documents are created - the actual search indexes;

    refining the resulting list. At this stage, a list of links is created that will be passed to the user as a result. Refining the resulting list involves filtering and ranking the search results.

Filtering means filtering out links that are inappropriate to provide to the user (for example, checking for duplicates). Ranking consists of creating a special order for presenting the resulting list (by the number of keywords, related words, etc.).

The main task of any information system is to search for information relevant to the user’s information needs. It is very important not to lose anything as a result of the search, that is, to find all the documents related to the request and not find anything superfluous. Therefore, a qualitative characteristic of the search procedure is introduced - relevance.

Relevance– this is the correspondence of search results to the formulated query.

1 Search tools

Search tools are special software whose main purpose is to provide the most optimal and high-quality information search for Internet users. Search tools are hosted on special web servers, each of which performs a specific function:

Web search engines are servers with a huge database of URLs that automatically access WWW pages at all these addresses, examine the contents of these pages, form and write keywords from the pages into their database (indexes the pages).

Moreover, search engine robots follow links found on pages and re-index them. Since almost any WWW page has many links to other pages, with such work, a search engine can theoretically crawl all sites on the Internet as a final result.

This type of search tools is the most famous and popular among all Internet users. Everyone has heard the names of well-known web search engines (search engines) - Yandex, Rambler, Aport.

The way web search engines work is as follows:

    Analysis of web pages and recording of analysis results at one or another level of the search server database.

    Searching for information based on user request.

    Providing a convenient interface for searching information and viewing search results by the user.

The working techniques used when working with one or another search tool are almost the same. When describing them, the following concepts are used:

    The search tool interface is presented in the form of a page with hyperlinks, a query line (search line) and query activation tools.

    A search engine index is an information base containing the result of an analysis of web pages, compiled according to certain rules.

    A query is a keyword or phrase that a user enters into the search bar. To form various queries, special characters ("", ~), and mathematical symbols (*, +, ?) are used.

The information search scheme is simple. The user types a key phrase and activates the search, thereby receiving a selection of documents based on the formulated (specified) request. This list of documents is ranked according to certain criteria so that at the top of the list are those documents that most closely match the user's request. Each of the search tools uses different criteria for ranking documents, both when analyzing search results and when creating an index (populating an index database of web pages).

In Russia, the largest and most popular search indexes are:

    "Yandex" (www.yandex.ru)

    "Rambler" (www.rambler.ru)

    "Google" (www.google.ru)

    "Aport2000" (www.aport.ru)

2 Search mechanisms

The generalized search technology consists of the following stages:

    The user formulates a request

    The system searches for documents (or their search images)

    The user receives the result (information about documents)

    The user improves or reforms the request

    Organizing a new search...

Typically, search engines support two modes: simple search mode and advanced search mode. Let's consider the generalized possibilities.

Forming a request in simple search mode. You can simply enter one or more words separated by a space; the search for words with all possible endings is modeled by the symbol * at the end of the word. Many systems allow you to search for phrases or phrases; to do this, you need to enclose it in quotation marks. Mandatory inclusion or exclusion of certain words may be required.

The main problem of searching using a primitively composed query (in the form of listing keywords) is that the search engine will find all pages on which the specified words appear in any part of the document. Typically, the number of pages found will be too large.

To improve the quality of search in simple search mode, it is permissible to use logical operators and operators that allow you to limit the search area, as well as select a specific category of documents from the presented list.

Many search engines include special operators in their query language that allow you to search in certain areas of a document, for example, in its title, or search for a document by a known part of its address.

Advanced or detailed query mode in different systems it is implemented individually, but most often it is a form in which the mentioned operators and key elements are implemented by simply checking the appropriate boxes or selecting parameters from a list.

Below, as an example, is information from the section help Yandex search engine: advanced search window, query language, search in what was found.

Search V found If V result of Yandex request found a lot of documents, but on a broader topic than you want, you can narrow this list by specifying your query. Another option is to enable the checkbox V found V search form, set additional keywords, and the next search will be conducted only on those documents that were selected V previous search.

Cheat Sheet on Using Query Language

Example

Meaning

"Come to us for morning pickle"

The words come in a row in the exact form

"The *ambassador has arrived"

Missing word in quote

half a slice & corn

Words within one sentence

equip && get

Words within one document

capercaillie | partridge | someone

Search for any of the words

you can't<< винить

Non-ranking "and": the expression after the operator does not affect the position of the document in the search results

I must /2 execute

Distance within two words in any direction (that is, one word can occur between given words)

something I ~~ understand

Elimination of a word I'll understand from search

with my /+2 intelligence

Distance within two words in direct order

tea ~ laptem

Search for a sentence where the word is tea meets without a word bast shoes

cabbage soup /(-1 +2) slurping

Distance from one word in reverse order to two words in forward order

I figure out what! what

Words in exact form with specified case

it turns out && (+on | !me)

Parentheses form groups in complex queries

Policy

Dictionary form of the word

title:(in country)

Search by document titles

url:ptici.narod.ru/ptici/kuropatka.htm

Search by URL

certainly inurl:vojne

Search based on URL fragment

Search by host

Search by host in reverse entry

site:http://www.lib.ru/PXESY/FILATOW

Search across all subdomains and pages of a given site

Search by one file type

Search limited by language

Domain-limited search

Search with date restrictions

state business && /3 you catch the thread

Distance 3 sentences in any direction

something I ~~ understand

Elimination of a word I'll understand from search

An interesting option is to search for documents on the web that link to a page with a URL you specify. This way, you can find pages on the web that have links to your Web site. Some systems will allow you to limit your search within a specified domain.

Additional special operators include:

    Operators for searching documents with a specific graphic file;

    Operators limiting the date of the pages being searched;

    Proximity operators between words;

    Word form accounting operators;

    Operators for sorting results (by relevance, freshness, oldness).

It should be noted that, unfortunately, today there is no standard for the number and syntax of supported operators for various search engines. Efforts are underway to develop a standard for the syntax of supported operators, so it is hoped that search engine developers will take care of the user experience. At this stage of development of search tools, a user, when accessing a particular search engine, must first of all become familiar with its rules for composing queries. As a rule, there will be a link on the home page Help which will take you to reference information.

Different search engines describe different numbers of information sources on the Internet. Therefore, you cannot limit your search to just one search engine.

Let's consider ways presentation of search results in search engines.

Most often, the number of documents found exceeds several dozen, and in some cases can reach hundreds of thousands! Therefore, as a form of issuance, a list of documents of 5-10-15 units per page is compiled with the ability to move to the next portion at the bottom of the page. The title and URL (address) of the found document must be indicated; sometimes the system indicates the degree of relevance of the document as a percentage.

The description of a document most often contains the first few sentences or excerpts from the text of the document with keywords highlighted. As a rule, the date of update (verification) of the document is indicated, its size in kilobytes; some systems determine the language of the document and its encoding (for Russian-language documents).

What can you do with the results obtained? If the title and description of the document meets your requirements, you can immediately go to its original source using the link. It is more convenient to do this in a new window in order to be able to further analyze the search results. Many search engines allow you to search the documents found, and you can refine your query by introducing additional terms.

If the intelligence of the system is high, you may be offered the service of searching for similar documents. To do this, you select a document you particularly like and point it to the system as a model to follow.

However, automating similarity determination is a very non-trivial task, and often this function does not work as expected. Some search engines allow you to re-sort the results. To save you time, you can save your search results as a file on your local drive for later offline study.

Search technologies

Laws of friction and heat and mass transfer in a turbulent boundary layer

There are several types of representation of the “law of friction” (for the reference case), leading to almost identical results. In accordance with the concept of a “logarithmic” boundary layer (at the value of the first turbulence constant χ = 0.4) the friction law for extremely developed turbulence with “vanishing viscosity” is well approximated by the simple Karman formula:

For a power-law representation of the velocity profile, the following formula should be proposed:

Where: ; n– power exponent of the velocity profile;

– semi-empirical coefficient;

A– empirical coefficient;

δ – thickness of the boundary layer.

Using relations for Reynolds numbers built on different linear quantities:

It is important to note that for the case of development of a turbulent boundary layer from the leading edge ( x cr = 0) the law of friction should also be presented in the form:

The values ​​of the parametric quantities of the presented formulas for various speed profiles are summarized in the table

Parameter n
1/7 1/8 1/9 1/10
A 8,74 9,71 10,6 11,5
0,0975 0,089 0,0818 0,0757
1,28 1,25 1,22 1,20
m 0,250 0,222 0,200 0,182
B 0,0252 0,0206 0,0190 0,0148
m 1 0,200 0,182 0,167 0,154
B 1 0,0576 0,0450 0,0362 0,0308

Other forms of representing the law of friction are also known and used, leading to almost the same results. So V.M. Ievlev proposed an approximation:

Formulas for the laws of heat and mass transfer are obtained from the “laws of friction” for standard conditions (reference case) using the well-known principle of Reynolds' triple analogy.

Where: S– correction factor – Reynolds analogy factor for non-compliance with the conditions of the standard (and), factor S as a first approximation, it is satisfactorily approximated by the relation:

It is important to note that in the case of using integral parameters, the “laws” of heat and mass transfer are well described by the dependencies:

Web technology World Wide Web (WWW) is considered a special technology for preparing and posting documents on the Internet. The WWW includes web pages, electronic libraries, catalogues, and even virtual museums! With such an abundance of information, the question arises: “How to navigate in such a huge and large-scale information space -” Search tools come to the rescue in solving this problem.

Search tools are special software whose main purpose is to provide the most optimal and high-quality search for information for Internet users. Search tools are hosted on special web servers, each of which performs a specific function:

1. Analysis of web pages and entering the analysis results to one or another level of the search server database.

2. Search for information based on the user's request.

3. Providing a convenient interface for searching information and viewing search results by the user.

The working techniques used when working with one or another search tool are almost the same. Before we discuss them, let's consider the following concepts:

1. The search tool interface is presented in the form of a page with hyperlinks, a query line (search line) and query activation tools.

2. Search engine index - an information base containing the result of analysis of web pages, compiled according to certain rules.

3. Query - a keyword or phrase that the user enters into the search bar. To form various queries, special characters ("", ~), and mathematical symbols (*, +, -) are used.

The information search scheme is simple. The user types a key phrase and activates the search, thereby receiving a selection of documents based on the formulated (specified) request. This list of documents is ranked according to certain criteria so that at the top of the list are those documents that most closely match the user's request. Each of the search tools uses different criteria for ranking documents, both when analyzing search results and when creating an index (populating an index database of web pages).

However, if you specify a query of the same design in the search bar for each search tool, you can get different search results. It is of great importance for the user which documents will appear in the first two to three dozen documents in the search results and how well these documents correspond to the user’s expectations.

Most search tools offer two search methods − simple search(simple search) and advanced search(advanced search) with or without a special request form. Let's consider both types of search using the example of an English-language search engine.

For example, AltaVista is convenient to use for arbitrary queries, ʼʼSomething about online degrees in information technologyʼʼ, while Yahoo's search tool allows you to get world news, information about exchange rates or weather forecasts.

Mastering query refinement criteria and advanced search techniques allows you to increase search efficiency and quickly find the necessary information. First of all, you can increase the efficiency of your search by using logical operators (operations) Or, And, Near, Not, mathematical and special symbols in your queries. Using operators and/or symbols, the user associates keywords in the required sequence to obtain the most appropriate search result for the query. A simple query does not give a large number of links to documents, because the list includes documents containing one of the words entered during the request, or a simple phrase (see Table 1). The and operator allows you to indicate that all keywords should be included in the content of the document. However, the number of documents must still be large and reviewing them will take quite some time. For this reason, in some cases it is much more convenient to use the context operator near, which indicates that the words should be located in sufficient proximity in the document. Using near significantly reduces the number of documents found. The presence of the "*" character in the query string means that the word will be searched by its mask. For example, we will get a list of documents containing words starting with “gov” if we write “gov*” in the query string. These are the words government, governor, etc.

The most developed search service for Russian-language information is provided by the Yandex search server. In Yandex, you can simply write a phrase in Russian that describes what you want to find, and the system will analyze and process your request, and then try to find everything that relates to the given topic. Using special operators, you can create a string that explains to the search engine what your requirements should be for the information you are interested in. Some of the Yandex query language operators can be viewed here: http://help.yandex.ru/search/ -id=481939

The no less popular search engine Rambler keeps statistics on link traffic from its own database; the same logical operators AND, OR, NOT, the metasymbol * (similar to the character * in AltaVista that expands the query range), coefficient symbols + and - are supported, to increase or reducing the significance of the words entered into the query.

Let's look at the most popular technologies for searching information on the Internet.

Topic 3 Working with Internet search engines


After studying this topic, you will learn and repeat:

What are search servers for?
- purpose of the main parts of search servers;
- what types of information search exist on the Internet;
- basic rules for forming a query in the Yandex search engine.

Search by URL

The fastest and most reliable way to search for information on the Internet is to search by URL. Many of them are presented in printed publications, special reference books, and are heard on popular radio stations and on TV screens.

♦ Fans of the Zenit football club know the address www.fc-zenit.ru by heart.
♦ Fans of the group “The King and the Jester” are well aware of the official website of this group www.korol.spb.ru.
♦ Fans of the NTV channel can easily find its website at www.ntv.ru. To quickly access the above resources, simply launch a browser program, such as Internet Explorer, and type a familiar URL in the address bar.

Search engines

There is a huge amount of documents on the Internet. To make it easier to find the necessary information, special search engines are created.

Search engines- these are automatic systems that poll servers connected to the global network and store in their database information about the data available on the servers. Based on a specially formulated query, search engines provide information about where you can get the necessary data.

Typically, search engines consist of three parts: robot, index and query processing program.

Robot (Spider, Robot or Bot) is a program that visits web pages and reads (in whole or in part) their content. Search engine robots differ in their individual scheme for analyzing the content of a web page.
Search engine index is a repository of search images of pages visited by robots. A search image of a document (including a web page) is a description of the content of the document in a special information retrieval language. This description contains codes of document keywords that reflect its meaning and content. Indexes in each search engine differ in the volume and method of organizing the stored information. The databases of leading search engines store information about tens of millions of documents, and their index volumes amount to hundreds of gigabytes. Indexes are periodically updated and supplemented, so the results of one search engine with the same query may differ if the search was carried out at different times.

Request Processing Program is a program that, in accordance with the user’s request, “looks” through the index for the presence of the necessary information and returns links to the documents found. The set of links at the output of the system is distributed by the program in descending order of relevance, that is, from the greatest degree of correspondence of the link to the request to the least.

Currently, the most popular for Russian Internet users are three large index-type search engines:

These systems take into account the grammatical features of the Russian language, so their search results in Russian-language resources are of higher quality than those of Western systems.

Search engines differ in the coverage of information resources:

♦ general search engines have a database in all areas of knowledge and are distinguished by an extensive index and a large volume of accumulated information;
♦ Special purpose search engines look only at sites on a specific topic, such as music or museums.

The main characteristics of search engines are:

♦ volume of documents in the index;
♦ frequency of information update;
♦ the information space that the search engine robot covers and the variety of types of documents about which information is collected;
♦ request processing speed;
♦ criterion for determining relevance (compliance of the found document with the search query);
♦ the ability to detail and clarify the request.

Search by search engine category

Search directories are a systematic collection (selection) of links to other Internet resources. The links are organized in the form of a thematic rubricator, which is a hierarchical structure through which you can navigate to find the information you need.

Let us take as an example the structure of the Yandex Internet search catalogue. This is a general-purpose directory, as it contains links to Internet resources in almost all possible areas. The following topics are highlighted in this catalogue:

♦ Business and economics;
♦ Directories and links;
♦ Society and politics;
♦ Home and family;
♦ Science and education;
♦ Entertainment and relaxation;
♦ Computers and communications;
♦ Culture and art.

Each topic includes many subsections, and these, in turn, contain headings, etc.

Suppose you are preparing an event for Victory Day and want to find the words of Bulat Okudzhava’s famous military song “You hear the boots rattling” on the Internet. The search can be organized as follows: Yandex Catalog Culture and art Music Author's song.

This search method is quite fast and effective. At the end you are offered only 5 links, among which there are links to sites with songs of famous bards. All that remains is to find the archive with the lyrics of B. Okudzhava’s songs on the website and select the desired text from it.

Another example. Suppose you are going to buy a mobile phone and want to compare the characteristics of devices from different companies. The search could be conducted according to the following catalog headings: Yandex Catalog Computers and communications Mobile communications Mobile phones.

Having received a limited number of links, you can quickly view them and select a phone by examining the characteristics of the companies and modifications of the devices.

Search by keywords

Most search engines have the ability to search by keyword. This is one of the most common types of search. To search using keywords, you need to enter the word or several words you want to search in a special window and click on the Search button. The search engine will find and display documents containing these words in its database. There may be many such documents, but many in this case do not necessarily mean good.

Let's conduct several experiments with any of the search engines. Let's assume that we decide to start an aquarium and we are interested in any information on this topic.

At first glance, the simplest thing is to search for the word “aquarium”. Let's check this, for example, in the Yandex search engine. The search result will be more than 460,000 pages on 3,500 sites - a huge number of links. Moreover, if you look more closely, among them there will be sites that mention B. Grebenshchikov’s group “Aquarium”, shopping centers and informal associations with the same name, and much more that has nothing to do with aquarium fish.

It is not difficult to guess that such a search cannot satisfy even the most unassuming user. Too much time will have to be spent on selecting among all the proposed documents those that relate to the subject we need, and even more so on getting acquainted with their contents.

We can immediately conclude that searching by one word is, as a rule, impractical, because using one word it is very difficult to determine the topic that a document, web page or site is dedicated to. The exception is rare words and terms that are almost never used outside their thematic area.

Let's try to clarify the search conditions and enter the phrase “aquarium fish”. The search result will be a little more than 20,000 pages and about 650 sites. As you can see, the number of links has decreased by more than 20 times. This result suits us more, but still among the proposed links there may be, for example, Russian souvenir sets of match labels with images of fish, and collections of screensavers for the computer desktop, and catalogs of aquarium fish with photographs, and aquarium accessories stores.

It is obvious that we should continue to move towards clarifying the search conditions.

In order to make the search more productive, all search engines have a special query language with its own syntax. These languages ​​are similar in many ways. It is quite difficult to study them all, but any search engine has a help system that will allow you to master the desired language.

Here are ten simple rules for forming a query in the Yandex search engine.

1. Keywords in the query should be written in lowercase (small) letters. This will ensure that all keywords are searched, not just those that start with a capital letter.

2. When searching, all forms of the word are taken into account according to the rules of the Russian language, regardless of the form of the word in the query. For example, if the word “know” was specified in the query, then the words “we know”, “you know”, etc. will also satisfy the search condition.

3. To find a stable phrase, you should enclose the words in quotation marks, for example, “porcelain dishes.”

4. To search by exact word form, you need to put an exclamation mark in front of the word. For example, to search for the word “September” in the genitive case, you would write “!September”. 

5. To search within a single sentence, words in the query are separated by a space or an & sign: “adventure novel” or “adventure&novel”. Several words typed in a query, separated by spaces, mean that they all must be included in one sentence of the document being searched.

6. If you want only those documents that contain each word specified in the query to be selected, put a plus sign “+” in front of each of them. If, on the contrary, you want to exclude any words from the search result, put a minus “-” in front of this word. The signs “+” and “-” must be written separated by a space from the previous one and together with the next word. For example, the query “Volga-car” will find documents that contain the word “Volga” and not the word “car”.

7. When searching for synonyms or words with similar meanings, you can put a vertical bar “|” between words. For example, for the query “child | baby | baby" documents with any of these words will be found.

8. Instead of one word in a query, you can substitute an entire expression. To do this, it must be put in brackets, for example, “(child | baby | children | baby) + (care | education).”

9. The *~" (tilde) sign allows you to find documents with a sentence containing the first word, but not the second. For example, the query “books ~ store” will find all documents containing the word “books”, next to which (within the sentence) there is no word “store”.

10. If the operator is repeated once (for example, & or ~), the search is performed within the sentence. The double operator (&&, -) specifies a search within a document. For example, the query “cancer - astrology” will find documents with the word “cancer” that are not related to astrology.

Having a certain set of the most common terms in the desired area, you can use advanced search. In Fig. Figure 3.3 shows the advanced search window in the Yandex search engine. In this mode, the capabilities of the query language are implemented in the form of a form. A similar service, including dictionary filters, is offered by almost all search engines.

Rice. 3.3. An example of an advanced search in the Yandex system

Provided that the desired and required words are chosen correctly and undesirable terms are excluded, such a search can yield good results.

Let's return to the example with aquarium fish. After reading several documents offered by the search engine, it becomes clear that searching for information on the Internet should not begin with choosing aquarium fish. An aquarium is a complex biological system, the creation and maintenance of which requires special knowledge, time and serious investment.

Based on the information received, a person searching the Internet can radically change the strategy for further search by deciding to study specialized literature related to the issue under study.

To search for literature or full-text documents, the following query is possible:

“+(aquarium | aquarist | aquarium hobby) + for beginners + (advice | literature) + (article | thesis | full text) - (price | store | delivery | catalog).”

After processing the request by the search engine, the following result was obtained: pages - 195, sites - at least 43.

As can be seen from the search statistics, the result was very successful. Already the first links lead to the required documents:

Placing an Aquarium > Tips for the Beginner Aquarist >
Articles > Aq uascope. ru
http://aquascope.ru/modules/wfsection/article.php?page=l&articleid=49 (32KB) - strict compliance.
ADVICE FOR BEGINNING AQUARIUMISTS. How to choose and install an aquarium, how...
http://www.aquariums.ru/sovna.htm (2KB) 07/23/2002 - non-strict compliance.

Now you can summarize the search results, draw certain conclusions and decide on possible actions:

♦ Stop further search, since for various reasons you are unable to maintain an aquarium.
♦ Read the suggested articles and start setting up an aquarium.
♦ Look for materials about hamsters or budgies.

Professional search

Researchers and specialists will have to take a more thoughtful approach to organizing the search. When searching for information on the Internet professionally, the following requirements must be met:

♦ high search speed;
♦ reliability of the information received;
♦ complete coverage of resources when searching.

Speed. The speed of a search depends mainly on two factors: competent search planning (selection of search services and tools) and skills in working with an already selected resource (the ability to quickly understand its structure and navigation methods). Search indexes are not enough to ensure search speed. In addition to them, there are a number of search resources on the Internet, the use of which ensures the performance of a professional search.

Credibility. The issue of the reliability of information received from the Internet is very relevant, since anyone can post any information there without any control over its compliance with reality. This, in turn, leads to a large number of unreliable sources, such as essays and term papers that flood the Internet.

There are special search services that allow you to assess the reliability of an information source on the Internet.

Completeness. A necessary condition for successful full-scale collection of information is knowledge of the main types of resources existing today and the use of various search services. No search engine can cover all Internet resources.

As a rule, to achieve a positive result, the user must resort to the services of several search engines. You can do this yourself, moving from system to system, or you can entrust this work to one of the metasearch systems (meta is the first component of complex words, denoting systems for describing and researching other systems).

Rice. 3.4. Metasearch engine windows

Metasearch engines do not have their own search databases and use the resources of many other search engines when searching. Due to this, the probability of finding the necessary information is very high. Work in metasearch systems is carried out according to the same rules as work in search engines. This is due to the fact that metasearch engines are a kind of add-on to search engines and use their index databases in their work. The appearance of metasearch engines resembles the appearance of well-known search engines. In Fig. 3.4 shows the windows of the metasearch engines myweb.ru and metabot.ru.

Experience shows that in most cases, better results are achieved by using several independent search indexes than by using a single metasearch engine.

Test questions and assignments

1. What is the purpose of a browser program?

2. What browser programs do you know?

3. Where can a web searcher find URLs?

4. What is the technology for searching using the search engine's rubricator?

5. What is the technology for searching by keywords?

6. What requirements must be met when searching for information on the Internet professionally?

7. When should “+” or “-” signs be specified in the search criteria?

8. What search criteria in Yandex are specified by the following phrase:

(nanny | teacher | governess) ++ (care | education | supervision).

9. What does doubling the sign (∼∼ or ++) mean when forming a complex query?

10. What is search relevance?

11. What is the purpose of metasearch engines?

Did you like the article? Share with your friends!
Was this article helpful?
Yes
No
Thanks for your feedback!
Something went wrong and your vote was not counted.
Thank you. Your message has been sent
Found an error in the text?
Select it, click Ctrl + Enter and we will fix everything!