2022 was a revolutionary year for the output of neural networks to a wide consumer. In addition to voice assistants, which were mainly used as a toy, amazing solutions entered the market that were highly appreciated by professionals – artists, programmers, writers and scientists. Sets of high-tech solutions have become available for mass use. The vast majority of such solutions rely on perceptron technology, a device that implements neural networks at the physical level, and has been known by engineers for several decades.
Despite the fact that the theoretical foundations are taught to every technical university, the developers did not seek to include them in their applications. This is due to the fact that you need to go through several stages before you can interact with the constructed neural network – the development of a neural network is a laborious process, but its training is also extremely resource-intensive, and therefore quite costly with each change in basic properties.
Some technology giants (Google, IBM) opened access to their platforms for creating elements of artificial intelligence, but, as a rule, they had all the same drawbacks as the previously known percentron – a very high entry threshold even for people with sufficient theoretical knowledge – in addition to the fact that all operations were performed in certain programming languages (usually In Python), you had to be friends with the console, and have devOps skills. Apple did not stand aside, for the time being, offered the use of all the same approaches.
Currently, Apple presents a set of tools that allow you to get the power of neural networks “out of the box”: neural networks have become publicly available, and can be easily exported / imported as model files – there is no need to design and train it – it is enough to import the desired model into the source code of the application in any C-related language, and then use this model in the same way, like any application class. And for those enthusiasts who would like to create their models from scratch, there is an application with a graphical interface that allows not only to go through creation, training and validation, but even to start using it right inside the application without writing a single line of code (including in the console).
Any developer gets the “Create ML” app along with XCode for free. Its possibilities are limited only by the imagination of the developer.
When creating a new project in the “Create ML” application at the beginning of 2023, 13 types of neural networks are available. Some of them are platform-oriented, but most can be used far beyond the Apple ecosystem.
If you go the anthropomorphic way, then the difference in types is not difficult to understand: each network corresponds to a certain sensory modality of perception: visual, auditory, kinesthetic. Currently, there is no type for the “olfactory” modality, but it is replaced by a “mental” modality – the ability to perceive a series of numbers and text. The need to distinguish the types of networks is also due to the fact that for each of the modalities there are not only its own specific training procedures, but also stimulating material in a certain format.
Here it is also not superfluous to recall that in the Higher Nervous System (ANS) sensation and perception (perception and apperception) are separated from each other. Sensation is a reaction caused directly by a stimulus, while perception is a complex emotional-intellectual process that is a derivative (and not always a first-order derivative) of sensations. In terms of ML programming, the closest equivalents would be classification and segmentation.
The list below allows you to understand which types of neural networks are most suitable for solving publicly available tasks:
Image Classification – associates stimulus images with predefined classes, and allows you to correlate a previously unknown image with a specific class. The result is expressed in the percentage of probability that the image is included in each class of the known neural network.
Object Detection – segments the image – selects the object from the background. This is the most resource-intensive type of neural network – training requires a large number of images, on which object classes are highlighted with a contour, and information about the contour is stored in a json file. One of the ready-made neural networks used in mobile devices was created on the basis of 12.5 thousand images, and its training took days of machine time.
Style transfer – allows you to convert the image to a certain style that was used as a stimulus material. For example, you can redraw the Mona Lisa, as Van Gogh would have done.
Hand Pose Classification – allows you to determine the hand gestures that are demonstrated in the photo.
Action Classification – allows you to determine the type of activity of people on the video.
Hand Action Classification – using video to identify passes with your hands – magic enters our daily lives.
Activity Classification – use sensors and sensors (for example, in Apple Watch) to correlate with preset classes of activities.
Sound Classification – allows you to correlate certain sounds with certain pre-task classes (which, in fact, allows you to receive and execute a voice command or determine the source of the sound).
Text Classification – correlates a text article (for example, from an RSS feed) with a specific category. The trained network can be used in any text chat with an infinite number of scenarios.
Word Targeting – like regular expressions allows you to disassemble a text article into components, while focusing not on the content, but on the structure of the text.
Tabular Classification – uses text and numerical statistical data to make predictions, For example, knowing age, gender and social status, you can predict the probability that the owner will survive the sinking of the Titanic.
.class | gender | age | survived | probability Survived / Died |
3 | Female | 28 | [D: 0.8114577995505022, S: 0.18854220044949774] | |
3 | Female | 29 | [D: 0.8114577995505022, S: 0.18854220044949774] | |
3 | Female | 30 | [S: 0.3560112453493577, D: 0.6439887546506423] | |
3 | Female | 31 | [S: 0.4925866900218921, D: 0.507413309978108] | |
3 | Female | 32 | + | [S: 0.5323354483770434, D: 0.4676645516229566] |
3 | Female | 33 | + | [S: 0.6486106272582948, D: 0.35138937274170523] |
3 | Female | 34 | + | [S: 0.6486106272582948, D: 0.35138937274170523] |
3 | Female | 35 | + | [S: 0.6486106272582948, D: 0.35138937274170523] |
3 | Female | 36 | + | [S: 0.6486106272582948, D: 0.35138937274170523] |
3 | Female | 37 | [S: 0.440002275163433, D: 0.5599977248365671] | |
3 | Female | 38 | [S: 0.440002275163433, D: 0.5599977248365671] | |
3 | Female | 39 | [D: 0.7520403642728353, S: 0.24795963572716476] | |
3 | Female | 40 | [D: 0.7520403642728353, S: 0.24795963572716476] | |
3 | Female | 41 | [D: 0.700825215199874, S: 0.29917478480012605] | |
3 | Female | 42 | [D: 0.700825215199874, S: 0.29917478480012605] |
Tabular Regression – predicts the value of a quantity on an arbitrary graph, even when there are no known other values in the vicinity of the selected point. In mathematics, it corresponds to the concepts of interpolation, extrapolation and approximation.
Recommendation – allows you to get recommendations regarding non-intersecting series of numbers – in practice, it is most often used when you need to choose a basket of goods for the buyer based on purchases made earlier.
Instead of a conclusion. It is impossible to stop the desire to integrate grains of artificial intelligence into your applications by trying them at least once. At the same time, the simplicity of implementation allows you to do this in the time necessary for the preparation of morning espresso.