Technologies


In any decisions in the field of cyber security, in particular when creating information space monitoring systems, scientific justification of the application of technologies and solutions is necessary. Open source monitoring systems, including websites and social networks, are no exception. The service for monitoring, forecasting and automated identification of information threats attackindex.com was created by a team consisting of representatives of technical and computer sciences, applied mathematics, media, sociology, IT and information security. That is why all modern scientific achievements, primarily our own, are quickly implemented into the Atak Index service.

Artificial intelligence is a stack (set) of technologies, including machine learning, neural networks, pattern recognition. Elements of these technologies are used in Attack Index. In particular:
• Machine learning – tonality of messages, rating of sources, forecasting the development of information dynamics;
• Cluster analysis – automated grouping of text messages, detection of plots, formation of plot chains;
• Computational linguistics – identification of stable phrases and narratives;
• Formation, clustering and visualization of semantic networks – determination of connections and nodes, development of cognitive maps;
• Correlation and wavelet analysis – detection of information operations.

Thanks to the application of the above technologies, Atak Index allows you to record and monitor information processes, forms a system of analytical indicators, determines the stability of information situations and predicts their development. The automated Index Attack report saves time and increases the efficiency of analysts’ work, in particular, in the formation of stories about information topics of research in order to understand the context and trends over long periods of time. Databases of Index Attacks are available for processing any queries on long timescales and allow obtaining information dynamics – that is, the number of found mentions of the keywords of the query for each date of the query.

Using the standard capabilities of office packages, statistical series, in particular information dynamics, can be processed to find relationships between them. The study of interrelationships is carried out using correlation coefficient calculations for the same time intervals, and correlation coefficient calculations are also carried out in conditions of a shift of one relative to the other.

Big data

Only in 2014, Google indexed 60 trillion documents on the Internet, and from 2016 to 2025, IDC predicts a tenfold increase in the amount of data, up to 163 zettabytes. Cisco predicts that in 2021, more than 100,000 GB of data will be transferred per second. In 2016, this figure was almost 27,000 GB per second.

In the case of a large number of information flows, which are formed by separate thematic information flows, it is necessary to take into account the dynamics of each one separately. In the case of studying the general flow of information, there is often a “flow” of publications from some that lose their relevance to others.

The general trend of changes in the studied series of events is called a trend. Today’s very popular word is also a relevant term for studying the flow of publications. Flows are organized by a set of network information resources and often accompany information operations. The system examines typical trends inherent in the flow of publications in network information resources that accompany information operations.

Information theory

The modern information space provides a unique opportunity to obtain a variety of information on the selected issue with the availability of appropriate tools, the use of which allows analyzing the relationship between possible events or events that are already taking place with the information activity of a selected circle of sources.

Examples of information dissemination networks that have the characteristics of information operations are shown in the figure below. Such templates can be used in pattern recognition, which is applied to time series and corresponding volumes of publications.

 

tackindex.com/wp-content/uploads/2018/05/energy.png” alt=”” width=”700″ height=”238″ />

The above schemes can be described according to the theory of energy distribution. Each new post appears initially with zero energy. Then, events similar to those observed in social networks may occur with it – like, dislike, repost, share link. Conventionally, these events affect the energy of the publication as follows:

  • like increases energy by 1;
  • dislike decreases by 1;
  • repost increases by 2;
  • share link increases by 1

The probability that any of these events will occur depends on the relevance of the message, interest in the information in it. All this in terms of the above theory is expressed by the amount of energy.

In a unit of time, one of these events may occur, two at the same time, or none at all. According to such rules of energy change, an increase in energy by 2 will correspond to the fact that a like and a repost happened at the same time; increase by 1 – only a repost happened; energy does not change if there was a like; dislike is reduced by 1 if none of the events occurred.

Thus, publications and their sources gain weight in the media space. They influence the fact that specific information is shared by users who focus on the significance of the publication, which is determined according to the scheme we described.

The starting value of the publication’s “energy” can be gained not only due to a “hot” topic or relevance. Artificial agents of influence can be responsible for it. After the publication gains a certain critical mass (three-digit counters of comments and reposts, for example), society will begin to organically spread the information embedded in the message.

Managed information

An information operation is an informational influence on mass consciousness (both hostile and friendly), influence on the information available to the object and necessary for decision-making, as well as on the competitor’s information and analytical systems. Any information operation has the following stages:

1 – background; 2 – lull; 3 – “art training”; 4 – calmness; 5 – attack / growth trigger; 6 – the peak of inflated expectations; 7 – loss of illusions; 8 – public awareness; 9 – performance / background

On the other hand, problems arise when gathering and analyzing information when it comes to large amounts of data, searching and navigating in ever-changing information flows. It is worth adding the factor of multilingualism among sites. All this makes it difficult to use the mentioned methods in information and analytical work.

The information space is a dynamic system of content-related elements (documents) that are formed in the process of their evolution as information flows.

Dynamics of publication of documents in the information space, including those directly related to information operations, form time series.

Methods of analysis

Formal methods of analysis can be applied precisely to time series: statistical, fractal, Fourier and wavelet. Analysis of these flows over time allows to reveal trends, cycles, anomalies and the presence of correlations.

When defining information operations, three approaches can be distinguished:

  • Basic approaches focused on tonality analysis. They can be used only at the stages of operational detection;
  • Approaches focused on pattern analysis can be used in strategic analysis and planning. Rather, it is important to take into account deviations from the usual information bursts and natural patterns;
  • Network approaches are well compatible with modern recognition technologies, neural networks, but cannot be effective without “learning” analysis of information flows over long periods of time.

In practice, hybrid approaches should be used, taking into account both machine learning, a number of templates, and the participation of knowledge experts. Therefore, in order to solve these problems, our system uses methods of working with Big Data, machine learning, neural networks, text mining, and also involves experts in the researched information fields.

Implementation of methods

Attack Index is an integral indicator of the level of information danger, which takes into account many factors. They include: the presence of information activity, the activity of possible competitors, the deviation of the average background, the presence of information operations and the stages of their development, the retrospective and dynamics of the negative tonality of publications, as well as the degree of chaotic processes. In addition, a tool for forecasting information events is under development.

Components of our solution:

  • Search for messages on topics of interest in global networks;
  • Tracking information flows (stories), relevant topics, events and processes;
  • Determining the dynamics of information flows;
  • Building the dynamics of the tonality of publications;
  • Definition of abnormal and critical at a given moment in the dynamics of thematic information flows;
  • Definition of the main events and objects of the thematic flow of information;
  • Visualization of relations of monitoring objects;
  • Forecast of the development of the situation.
  • Definition of the main events and objects of the thematic flow of information;
  • Visualization of relations of monitoring objects;
  • Forecast of the development of the situation.

Study of tonality

The implemented tonality detection system is based on a statistical approach and neural network training. The statistics are based on the detection of the most frequently used words in texts with a positive or neutral tone.

It should be remembered that the information space always reacts more actively to problems and negative events. As a result, in information flows, statistically, negativity occurs more often. Even experts cannot agree on what can be negative and what can be positive, so the task of the system is to correctly process the found text arrays and present the estimated values for consideration.

Attack Index takes into account the statistics of negative messages, the dynamics of increasing negative tones, as such trends indicate a potentially dangerous situation for the object of the request.

Participation in distribution

The list of sources includes leading news sites, regional media, blogs and forums. But sites with a dubious reputation also became an important component, because it is from them that information waves begin. Social networks are also the source of most of these worries: “hot” information is published on behalf of the profile, which is then broadcast through the support of not the most authoritative sites and reaches news services and even TV channels. Such manifestations are also taken into account when assessing the situation.

Using the methods of extracting data from texts, it is possible to form networks of interrelationships of concepts. Their nodes are keywords, names of personalities, companies, etc. The analysis of these networks makes it possible to reveal explicit and implicit connections between individual concepts, to evaluate the weight of certain concepts, to clarify the criteria for the formation of the information flow, and to see the interdependencies in the studied networks.

It is important to understand the presence and strength of connections between influence agents and sources. Our service implements the technology of automated formation of cognitive maps based on models of subject areas. A cognitive map is an oriented graph, the boundaries of which can be related to weight (energy, as in the example we described earlier).

Cognitive maps can be used to create informational support scenarios. The vertices of the cognitive map correspond to concepts and causal relationships. When analyzing cognitive maps, nodes and links are evaluated in relation to the selected concept, after which coherent chains are formed between these nodes.

Nodes can be connected to each other if the corresponding words are next to each other in the text, belong to the same sentence, syntactically or semantically connected.

Scientific literature that describes the theory and practices used in the creation of the Attack Index service.

1. Lande D.V. Shnurko-Tabakova E.V. “PROSPECTS OF AUTOMATION OF ANALYTICAL ACTIVITIES IN THE SPHERE OF NATIONAL DEFENSE AND SECURITY”

2. Publication date. 2020/11/26. Conference Scientific and practical conference: “Ensuring information security of the state in the military sphere: problems and ways to solve them.” Pages 89-90. Publisher. NATIONAL UNIVERSITY OF DEFENSE OF UKRAINE named after Ivan Chernyakhovsky

3. Lande D.V. Shnurko-Tabakova E.V. “METHODS AND MEANS OF ANALYTICAL SUPPORT FOR COUNTERING HYBRID STATE THREATS”

4. Publication date 2019/10/24. Conference. “PROBLEMS OF THE THEORY AND PRACTICE OF INFORMATION WARFARE IN THE CONDITIONS OF HYBRID WARFARE”. Pages 13-15. Publisher Ministry of Defense of Ukraine, Zhytomyr. troops S. P. Korolev Institute of Technology

5. Dmytro Lande, Ellina Shnurko-Tabakova. OSINT as a part of cyber defense system // Theoretical and Applied Cybersecurity, 2019. – N. 1. – pp. 103-108.

6. Gorbulin V.P., Dodonov O.G., Lande D.V. “Information operations and public security: threats, countermeasures, modeling: monograph”. Kyiv: Intertehnologiya, 2009. 164 p.

7. Dmytro Lande, Minglei Fu, Wen Guo, IrynaBalagura, Ivan Gorbov & Hongbo Yang. Link prediction of scientific collaboration networks based on information retrieval // World Wide Web: Internet and Web Information Systems. – N 23, pp. 2239-2257(2020). DOI:doi.org/10.1007/s11280-019-00768-9. ISSN: 1573-1413, 1386-145X.

8. Dmytro Lande, Oleh Dmytrenko, Oksana Radziievska. Determining the Directions of Links in Undirected Networks of Terms // Selected Papers of the XIX International Scientific and Practical Conference “Information Technologies and Security” (ITS 2019). CEUR Workshop Proceedings (ceur-ws.org). – Vol-2577. – pp. 132-145 ISSN 1613-0073.

9. Minglei Fu, Jun Fenga, Dmytro Lande, Oleh Dmytrenk, Dmytro Mankob, Ryhor Prakapovich. Dynamic model with super spreaders and lurker users for preferential information propagation analysis // (2020) Physica A: Statistical Mechanics and its Applications.Volume 561, 1 January 2021, 125266, DOI: doi.org/10.1016/j.physa.2020.125266.


The page is under development