Тип публикации: статья из журнала

Год издания: 2019

Идентификатор DOI: 10.31772/2587-6066-2019-20-2-153-159

Ключевые слова: малые выборки, классификация с учителем, ридж-регрессия, квантильное преобразование, мета-классификатор, значимость признаков, генетический алгоритм, small samples, supervised classification, ridge-regression, quantile transformation, meta-classifier, significance of features, genetic algorithm

Аннотация: The rapid development of technical devices and technology allows monitoring the properties of different physical nature objects with very small discreteness of the data. As a result, one can accumulate large amounts of data that can be used with advantage to manage an object, a multiply connected system, and a technological enterprise. However, regardless of the field of activity, the tasks associated with small amounts of data remains. In this case the dynamics of data accumulation depends on the objective limitations of the external world and the environment. The conducted research concerns high-dimensional data with small sample sizes. In this connection, the task of selecting informative features arises, which will allow both to improve the quality of problem solving by eliminating “junk” features, and to increase the speed of decision making, since algorithms are usually dependent on the dimension of the feature space, and simplify the data collection procedure (do not collect uninformative data). As the number of features can be large, it is impossible to use a complete search of all features spaces. Instead of it, for the selection of informative features, we propose a two-step random search algorithm based on the genetic algorithm uses: at the first stage, the search with limiting the number of features in the subset to reduce the feature space by eliminating “junk” features, at the second stage - without limitation, but on a reduced set features. The original problem formulation is the task of supervised classification when the object class is determined by an expert. The object attributes values vary depending on its state, which makes it belong to one or another class, that is, statistics has an offset in class. Without breaking the generality, for carrying out simulation modeling, a two-alternative formulation of the supervised classification task was used. Data from the field of medical diagnostics of the disease severity were used to generate training samples.

Ссылки на полный текст


Журнал: Сибирский журнал науки и технологий

Выпуск журнала: Т. 20, 2

Номера страниц: 153-159

ISSN журнала: 25876066

Место издания: Красноярск

Издатель: Федеральное государственное бюджетное образовательное учреждение высшего образования Сибирский государственный университет науки и технологий имени академика М.Ф. Решетнева


  • Kononova N.V. (Siberian Federal University)
  • Mangalova E.S. (ООО “RD Science”)
  • Stroev A.V. (Krasnoyarsk State Medical University named after Prof. V. F. Voino-Yasenetsky)
  • Cherdantsev D.V. (Krasnoyarsk State Medical University named after Prof. V. F. Voino-Yasenetsky)
  • Chubarova O.V. (Siberian State University of Science and Technology)

Вхождение в базы данных

Информация о публикациях загружается с сайта службы поддержки публикационной активности СФУ. Сообщите, если заметили неточности.

Вы можете отметить интересные фрагменты текста, которые будут доступны по уникальной ссылке в адресной строке браузера.