Cloud computing, big data and artificial intelligence understand the interrelationship between the three

March 26, 2023

Today, I will talk about cloud computing, big data and artificial intelligence. Why do you talk about these three things? Because these three things are very hot now, and they seem to have a relationship with each other: when talking about cloud computing, when it comes to big data, when it comes to artificial intelligence, it will raise big data, when it comes to artificial intelligence, it will mention cloud computing... ...feeling that the three complement each other and are inseparable. But if it is a non-technical person, it may be difficult to understand the relationship between the three, so it is necessary to explain.

First, the initial goal of cloud computing

Let's start with cloud computing. The initial goal of cloud computing is to manage resources. The main management areas are computing resources, network resources, and storage resources.

1 tube data center is like a computer

What is computing, networking, and storage resources?

For example, if you want to buy a laptop, do you want to care about what kind of CPU this computer is? How much memory? These two are called computing resources.

To access the Internet, this computer needs to have a network port that can be plugged into a network cable, or a wireless network card that can connect to our router. Your home also needs to open a network to operators such as China Unicom, mobile or telecommunications, such as 100M bandwidth. Then there will be a master to get a network cable to your home, the master may help you configure your router and their company's network connection. This way all your computers, mobile phones, and tablets can go online through your router. This is the network resource.

You may also ask how big the hard drive is? In the past, the hard disks were very small, and the size was 10G. Later, even the 500G, 1T, and 2T hard disks were not new. (1T is 1000G), this is the storage resource.

This is the same for a computer, the same for a data center. Imagine you have a very, very large computer room with a lot of servers, which also have CPU, memory, hard disk, and Internet access through router-like devices. The question at this time is: How do people who operate data centers manage these devices in a unified manner?

2 is flexible, you want to have it all, you want to have more

The goal of management is to achieve flexibility in two areas. What are the two aspects?

To give an example to understand: For example, if someone needs a small computer, only one CPU, 1G memory, 10G hard disk, and one megabyte of bandwidth, can you give it to him? Such a small-sized computer, now a laptop is stronger than this configuration, the home to pull a broadband to 100M. However, if you go to a cloud computing platform, he wants this resource, as long as it is there.

In this case it can achieve two aspects of flexibility:

Time flexibility: When do you want to be when you want it, when you need it, you will come out;

Spatial flexibility: How much you want. Need a computer that is too small to be satisfied; need a very large space such as a cloud disk, the space allocated by the cloud disk to each person is very large, and there is space at any time to upload, and it can never be used. of.

Spatial flexibility and time flexibility, which we often call the flexibility of cloud computing. The problem of solving this flexibility has gone through a long period of development.

3 physical equipment is not flexible

The first phase is the physical device period. Customers need a computer during this period, and we buy one in the data center.

Physical devices are of course more and more cattle, such as servers, memory is a hundred G memory; for example, network devices, a port can have a bandwidth of tens of G or even hundreds of G; for example, storage, at least PB level in the data center ( One P is 1000 T and one T is 1000 G).

However, physical devices cannot achieve great flexibility:

The first is that it lacks time flexibility. I can't reach when I want to be. For example, if you buy a server or buy a computer, you must have time to purchase it. If suddenly the user tells a cloud vendor that he wants to open a computer and use a physical server, it is difficult to purchase at the time. A good relationship with a supplier may take up to a week, and a general relationship with a supplier may require a month of purchase. The user waited for a long time before the computer was in place, and the user had to log in and slowly start deploying his own application. Time flexibility is very poor.

Second is its spatial flexibility. For example, the above users need a very small computer, but now there is such a small model computer? Can not meet the user as long as a G memory is 80G hard drive, go buy a small machine. But if you buy a big one, you will need to collect more money from the user because the computer is big. However, the user needs to use only a small amount, so it is very embarrassing to pay more.

4 virtualization is much more flexible

Someone will find a way. The first method is virtualization. Isn't the user a small computer? The physical devices in the data center are very powerful. I can virtualize a small piece from the physical CPU, memory, and hard disk to the customer, and can also virtualize a small piece to other customers. Each customer can only see the small piece of their own, but in fact each customer uses a small piece of the entire large device.

The virtualization technology makes the computers of different customers appear to be isolated. That is, I looked like this plate is mine. You look at this plate is yours, but the actual situation may be that my 10G and your 10G are on the same large and large storage. And if the physical devices are ready in advance, the virtualization software virtualizes a computer very fast and can be solved in a matter of minutes. So to create a computer on any cloud, it will come out in a few minutes, that's the reason.

This spatial flexibility and time flexibility are basically solved.

5 virtual world's earning and feelings

In the virtualization phase, the most bullish company is VMware. It is a company that implements virtualization technology earlier and can virtualize computing, networking, and storage. This company is very good, the performance is very good, the virtualization software is selling very well, making a lot of money, and later let EMC (the world's top 500, the first brand of storage manufacturers) to buy.

But there are still many people in the world who have feelings, especially in the programmers. What do people with emotions like to do? Open source.

There are many closed-source sources in the world, and the source is the source code. In other words, a certain software is doing well, everyone loves to use it, but the code of this software is closed by me, only my company knows, others don't know. If other people want to use this software, they have to pay me, this is called closed source.

But there are always some big cows in the world who can't get used to money and let a family earn. The big cows think that you will know me this technology; you can develop it, I can. I developed it is not to collect money, the code is taken out and shared with everyone, who can use it all over the world, all people can enjoy the benefits, this is called open source.

For example, the recent Tim Berners-Lee is a very affectionate person. In 2017, he won the 2016 Turing Award for "inventing the World Wide Web, the first browser, and the basic protocols and algorithms that allowed the World Wide Web to expand." The Turing Award is the Nobel Prize in the computer world. However, his most admirable is that he freely contributed the World Wide Web, which is our common WWW technology, to the world for free. All of our current online behaviors should be thanked for his credit. If he uses this technology to collect money, he should be almost as rich as Bill Gates.

There are many examples of open source and closed source:

For example, in the world of closed source, there is Windows. Everyone has to pay for Microsoft with Windows; Linux appears in the open source world. Bill Gates made a lot of money by relying on closed-source software such as Windows and Office. It is called the world's richest man, and Daniel has developed another operating system, Linux. Many people may not have heard of Linux. Many programs running on the back-end server are on Linux. For example, everyone enjoys double eleven, whether it is Taobao, Jingdong, Koala... The system that supports the double eleven snaps is running. On Linux.

If there is Apple, there is Android. Apple's market value is high, but the Apple system code we can't see. So there is a big cow wrote the Android mobile phone operating system. So you can see almost all other mobile phone manufacturers, which are loaded with Android. The reason is that the Apple system is not open source, and Android can be used by everyone.

The same is true for virtualization software. With VMware, this software is very expensive. Then there are two open source virtualization software written by Daniel. One is called Xen and the other is called KVM. If you don't do technology, you can ignore these two names, but you will mention it later.

6 virtualized semi-automatic and fully automatic cloud computing

To say that virtualization software solves the problem of flexibility is not entirely correct. Because virtualization software generally creates a virtual computer, it is necessary to manually specify which physical computer the virtual computer is placed on. This process may also require more complex manual configurations. So using VMware's virtualization software, you need to test a very good certificate, and the person who can get this certificate, the salary is quite high, but also the complexity.

Therefore, the cluster size of physical machines that can only be managed by virtualization software is not particularly large, generally in the scale of a dozen, dozens, and hundreds.

This aspect will affect time flexibility: although the time to virtualize a computer is very short, as the size of the cluster expands, the process of manual configuration becomes more and more complex and time consuming. On the other hand, it also affects space flexibility: when the number of users is large, the size of this cluster is far less than how much it wants. It is likely that this resource will soon be used up and it will have to be purchased.

Therefore, as the size of the cluster grows larger, it is basically thousands of starts, tens of thousands, or even tens of millions. If you check BAT, including Netease, Google, Amazon, the number of servers is scary. It is almost impossible for so many machines to rely on people to choose a location to put this virtualized computer and configure it accordingly. It still needs a machine to do this.

People have invented a variety of algorithms to do this, the name of the algorithm is called Scheduler. Generally speaking, there is a dispatch center. Thousands of machines are in a pool. No matter how many CPUs, CPUs, and hard disks the user needs, the dispatch center will automatically find a place in the big pool to meet the needs of users. Start the virtual computer and configure it, the user can use it directly. At this stage we call pooling or clouding. At this stage, it can be called cloud computing. Before that, it can only be called virtualization.

7 cloud computing private and public

There are two types of cloud computing: one is a private cloud, the other is a public cloud, and some people connect a private cloud to a public cloud as a hybrid cloud.

Private Cloud: Deploy the virtualized and clouded software in someone else's data center. Users who use private clouds tend to have a lot of money, buy their own space to build a computer room, buy a server themselves, and then let cloud vendors deploy themselves. In addition to virtualization, VMware also launched cloud computing products and made a lot of money in the private cloud market.

Public cloud: Deploying virtualization and cloud software in the cloud vendor's own data center, users do not need a lot of investment, just register an account, you can click on a web page to create a virtual computer. For example, AWS is the public cloud of Amazon; for example, Alibaba Cloud, Tencent Cloud, and Netease Cloud in China.

Why does Amazon want to be a public cloud? We know that Amazon was originally a relatively large e-commerce company abroad. When it is doing e-commerce, it will definitely encounter a scenario similar to the double eleven: at a certain moment, everyone rushes to buy things. When everyone rushes to buy something, the time flexibility and spatial flexibility of the cloud are especially needed. Because it can't always prepare all the resources, it's too wasteful. But you can't be prepared without anything, watching so many users of the Double Eleven want to buy things. Therefore, when it is necessary to double eleven, a large number of virtual computers are created to support the e-commerce application. After the double eleven, these resources are released and dried up. So Amazon needs a cloud platform.

However, commercial virtualization software is too expensive, and Amazon can't give all the money it earns from e-commerce to virtualization vendors. So Amazon developed a set of its own cloud software based on open source virtualization technology, Xen or KVM as described above. I did not expect that after the Amazon, the caller will become more and more cattle, and the cloud platform will become more and more cattle.

Because its cloud platform needs to support its own e-commerce application; traditional cloud computing vendors are mostly from IT vendors, and almost no applications, so Amazon's cloud platform is more friendly to applications and rapidly develops into the first brand of cloud computing. And made a lot of money.

Before Amazon announced its cloud computing platform earnings report, people speculated that Amazon e-commerce makes money, and the cloud also makes money? Later, when the financial report was published, it was found that it was not ordinary to make money. Last year alone, Amazon AWS had revenues of $12.2 billion and operating profit of $3.1 billion.

8 cloud computing to make money and feelings

The first Amazon in the public cloud is very cool, and the second Rackspace is just fine. No way, this is the cruelty of the Internet industry, and most of the winners are eating. So if the second place is not in the cloud computing industry, many people may have never heard of it.

The second place is like, what can I do without the boss? Open source. As mentioned above, although Amazon uses open source virtualization technology, the cloud code is closed source. Many companies that want to do and can't do cloud computing platforms can only watch Amazon's big money. Rackspace puts the source code open, and the whole industry can work together to make the platform better and better. The brothers will work together and fight with the boss.

So Rackspace and NASA co-founded the open source software OpenStack, as shown in the above figure, the architecture diagram of OpenStack, not the cloud computing industry does not need to understand this picture, but can see three keywords: Compute computing, Networking network, Storage storage. It is also a cloud management platform for computing, networking and storage.

Of course, the second-place technology is also very good. With OpenStack, it’s really like Rackspace thinks. All the big companies that want to be cloud are crazy. You can imagine all the big IT companies like IBM: Hewlett-Packard, Dell, Huawei, Lenovo, etc. are crazy.

Everyone wants to do the cloud platform. I watched Amazon and VMware make so much money. I couldn’t help but look at it. It seems that the difficulty is quite big. Now, with such an open source cloud platform OpenStack, all IT vendors have joined the community, contributed to this cloud platform, packaged into their own products, and sold together with their own hardware devices. Some have done private clouds, some have done public clouds, and OpenStack has become the de facto standard for open source cloud platforms.

9IaaS, resource level flexibility

As OpenStack technology becomes more mature, the scale of management can be larger and larger, and multiple OpenStack clusters can be deployed in multiple sets. For example, a set of deployment in Beijing, two sets of deployment in Hangzhou, and a set of deployment in Guangzhou will be followed by unified management. This way the whole scale is even bigger.

At this scale, for the perception of ordinary users, it is basically possible to know what to do and what to expect. Or take the cloud disk example, each user cloud disk is allocated 5T or even more space, if there are 100 million people, how much space is added.

In fact, the mechanism behind it is this: to allocate your space, you may only use a few of them, for example, it assigns you 5 T, such a large space is only what you see, not really For you, you actually only used 50 G, then the real one is 50 G. As your files are continuously uploaded, more and more space will be allocated to you.

When everyone uploads and the cloud platform is almost full (for example, 70%), it will purchase more servers and expand the resources behind it. This is transparent to the user and cannot be seen. From the perspective of feeling, the flexibility of cloud computing is realized. In fact, it is a bit like a bank. It gives the depositors the feeling of when to withdraw money. As long as they do not run at the same time, the bank will not be embarrassed.

10 summary

At this stage, cloud computing basically realizes time flexibility and space flexibility; it realizes the flexibility of computing, network, and storage resources. Computing, networking, and storage are often referred to as infrastructure Infranstracture, so the flexibility at this stage is called resource-level resiliency. The cloud platform for managing resources, we call infrastructure services, is the IaaS (Infranstracture As A Service) we often hear.

Second, cloud computing not only manages resources, but also applies applications.

With IaaS, is it enough to achieve flexibility at the resource level? Obviously not, there is flexibility at the application level.

Here is an example: For example, to realize the application of an e-commerce, ten machines are enough, and the double eleven needs one hundred. You may find it very easy. With IaaS, you can create 90 new machines. However, 90 machines were created empty, and the e-commerce application was not put on. It only allowed the company's operation and maintenance personnel to get one and one, and it took a long time to install.

Although the resource level is flexible, there is no flexibility in the application layer, and flexibility is not enough. Is there a way to solve this problem?

People have added a layer on top of the IaaS platform to manage the application flexibility of resources. This layer is often called PaaS (Platform As A Service). This layer is often difficult to understand, roughly divided into two parts: some of the author called "automatic installation of your own application", some of the author called "universal applications do not need to install."

Automatic installation of your own application: For example, the e-commerce application is developed by you, except for yourself, others do not know how to install it. Like the e-commerce application, you need to configure Alipay or WeChat account to install, so that when someone else buys something on your e-commerce, the money paid is in your account. No one knows you except you. So the installation process platform can't help, but it can help you automate, you need to do some work, and integrate your configuration information into the automated installation process. For example, in the above example, the 90 machines newly created by the Double Eleven are empty. If a tool can be provided and the e-commerce application can be automatically installed on the new 90 machines, the real flexibility at the application level can be achieved. . For example, Puppet, Chef, Ansible, and Cloud Foundary can do this. The latest container technology, Docker, can do this better.

Universal applications do not need to be installed: the so-called general-purpose applications generally refer to some complexities, but everyone is using them, such as databases. Almost all applications use databases, but database software is standard. Although installation and maintenance are more complicated, they are the same regardless of the installation. Such an application can be placed on the interface of the cloud platform by an application that becomes a standard PaaS layer. When the user needs a database, one point comes out and the user can use it directly. Someone asked, since the installation is the same, then I am coming, I don't need to spend money to buy on the cloud platform. Of course not, the database is a very difficult thing, the company Oracle can make so much money by relying on the database. Buying Oracle also costs a lot of money.

However, most cloud platforms will provide an open source database such as MySQL, which is open source, and the money does not need to spend so much. But to maintain this database, you need to hire a large team. If the database can be optimized to support the double eleven, it will not be able to get it in a year or two.

For example, if you are a bicycle, there is no need to recruit a very large database team to do this. The cost is too high. It should be handed over to the cloud platform to do this. Professional things are done by professional people. The platform is dedicated to hundreds of people to maintain this system, you only need to focus on your cycling application.

Either it is deployed automatically or not deployed. In general, you have to worry about the application layer. This is the important role of the PaaS layer.

Although the scripting method can solve the deployment problem of your own application, different environments are very different. A script often runs correctly in one environment, and it is not correct in another environment.

And the container is better able to solve this problem.

The container is a Container, and the Container is another container. In fact, the idea of ​​the container is to become a container for software delivery. The characteristics of the container: one is the package, and the other is the standard.

In the era of no container, it is assumed that the goods will be transported from A to B, and three terminals will be passed in the middle and three times. Every time you have to unload the cargo, it will be put up, and then put on the boat and re-arranged. Therefore, in the absence of a container, each time the ship is changed, the crew must stay on the shore for a few days to go.

With the container, all the goods are packed together, and the dimensions of the containers are all the same, so every time the ship is changed, the whole box can be moved, the hour level can be completed, and the crew no longer has to go ashore for a long time. It is.

This is the application of the two characteristics of container "package" and "standard" in life.

So how does the container package the application? Still have to learn the container. First of all, there must be a closed environment to enclose the goods so that the goods do not interfere with each other and are isolated from each other, so that loading and unloading is convenient. Fortunately, the LXC technology in Ubuntu can do this long ago.

The closed environment mainly uses two technologies, one is the technology that seems to be isolated, called Namespace, that is, the application in each Namespace sees different IP addresses, user spaces, process numbers, and so on. The other is to use isolation technology, called Cgroups, which means that the whole machine has a lot of CPU and memory, and an application can only use some of them.

The so-called mirror image is the moment when you weld the container, save the state of the container, just like Sun Wukong said: "fix", the container is set at that moment, and then save the state of this moment into a series of documents. The format of these files is standard, and anyone who sees them can restore the moment that was fixed at the time. The process of restoring a mirror to a runtime (that is, reading an image file and restoring that moment) is the process of running the container.

With the container, the PaaS layer becomes fast and elegant for the automatic deployment of the user's own application.

Third, big data embraces cloud computing

A complex and common application in the PaaS layer is the big data platform. How is big data stepping into cloud computing step by step?

1 data is small and contains wisdom

At the beginning, this big data is not big. How much data did you have? Now everyone is going to read e-books and watch the news online. In the post-80s, when we were young, the amount of information was not so big. Just look at books and look at newspapers. How many words do you have in a week's newspapers? If you are not in a big city, there is not a few bookshelves in the library of an ordinary school. Later, with the arrival of information technology, information will become more and more.

First, let's take a look at the data in big data. There are three types, one is structured data, one is unstructured data, and the other is semi-structured data.

Structured data: There is a fixed format and a limited length of data. For example, the completed form is structured data, nationality: People's Republic of China, nationality: Han, gender: male, which is called structured data.

Unstructured data: There are more and more unstructured data now, that is, data of variable length and no fixed format, such as web pages, sometimes very long, sometimes a few words are gone; for example, voice, video are non- Structured data.

Semi-structured data: It is in the form of some XML or HTML. It may not be known if it is not engaged in technology, but it does not matter.

In fact, the data itself is not useful and must be processed. For example, if you run a bracelet every day, it is also the data collected. So many web pages on the Internet are also data. We call it Data. The data itself is of no use, but the data contains a very important thing called Information.

The data is very messy and can be called information after being combed and cleaned. Information will contain many rules. We need to summarize the rules from the information, called knowledge, and knowledge changes fate. The information is a lot, but some people see that the information is equivalent to white, but some people have seen the future of e-commerce from the information, some people have seen the future of the live broadcast, so people will be cattle. If you don't extract knowledge from the information, you can only see a circle of friends in the Internet.

With knowledge, and then use this knowledge to apply to actual combat, some people will do very well, this thing is called Intelligence. Knowledge is not necessarily wise. For example, many scholars are very knowledgeable. What has happened can be analyzed from all angles. But when it is done, it can't be transformed into wisdom. The reason why many entrepreneurs are great is to apply the knowledge gained to practice and finally do a lot of business.

So the application of data is divided into four steps: data, information, knowledge, and wisdom.

The final stage is what many businesses want. You see that I have collected so much data, can I use this data to help me make the next decision and improve my product. For example, when a user watches a video, an advertisement pops up next to it, which is exactly what he wants to buy; when the user listens to music, he also recommends other music that he really wants to listen to.

The user randomly clicks the mouse on my application or website. The input text is data for me. I just want to extract some of them, guide the practice, form wisdom, and let the user fall into my application. I didn't want to leave when I went to my network. I kept buying my hands and kept buying.

Many people say that I have to break the net for the double eleven. My wife is constantly buying and buying on it. I bought A and recommended B. My wife said, "Oh, B is what I like, my husband wants to buy." You said how this program is so ox, so smart, I know my wife better than me, how is this thing done?

2 How does data sublimate into wisdom

The processing of the data is divided into several steps, and it will be wise at the end.

The first step is called the collection of data. There must be data first, and there are two ways to collect the data:

The first way is to take it. The point of professional point is to grab or crawl. For example, the search engine does this: it downloads all the information on the Internet to its data center, and then you can search it out. For example, when you go to search, the result will be a list. Why is this list in the search engine company? It's because he has taken the data down, but if you click on it, the website is not in the search engine. For example, Sina has a news, you use Baidu to search out, when you don't order, that page is in the Baidu data center, a little out of the page is in the Sina data center.

The second way is to push, there are many terminals that can help me collect data. For example, the Xiaomi bracelet can upload your daily running data, heartbeat data, and sleep data to the data center.

The second step is the transfer of data. It is usually done in a queue, because the amount of data is really too large, and the data must be processed to be useful. Can be handled systematically, but had to queue up and deal with it slowly.

The third step is the storage of data. Now the data is money, and mastering the data is equivalent to mastering the money. Otherwise, how does the website know what you want to buy? Just because it has data on your historical transactions, this information can not be given to others, it is very valuable, so it needs to be stored.

The fourth step is the processing and analysis of the data. The data stored above is the original data, the original data is mostly chaotic, there is a lot of junk data in it, so it needs to be cleaned and filtered to get some high quality data. For high-quality data, you can analyze it to classify the data, or discover the relationship between the data and get the knowledge.

For example, the story of the rumored Wal-Mart beer and diapers is based on the analysis of people's purchase data. It is found that when men buy diapers, they will buy beer at the same time, thus discovering the relationship between beer and diapers. Knowledge, and then applied to practice, get the wisdom of the beer and diaper counters very close.

The fifth step is the retrieval and mining of data. Search is search, the so-called foreign affairs is not determined to ask Google, and it is not necessary to ask Baidu. Both the internal and external search engines put the analyzed data into the search engine, so when people want to find information, they will have a search.

The other is mining. Just searching out can no longer satisfy people's requirements. It is also necessary to dig out the relationship from the information. For example, in financial search, when searching for a company's stock, should the company's executives be excavated? If you only searched out the company's stock and found it to be particularly good, then you went to buy it. In fact, its executive issued a statement that was very unfavorable to the stock and fell the next day. Doesn't it harm the majority of investors? Therefore, it is very important to mine the relationships in the data through various algorithms to form a knowledge base.

3 big data era, everyone collects firewood high

When the amount of data is small, few machines can solve it. Slowly, when the amount of data is getting bigger and bigger, and the most cattle servers can't solve the problem, what should I do? At this time, it is necessary to aggregate the power of multiple machines. Everyone works together to get the matter together.

For the collection of data: As far as IoT is concerned, thousands of detection devices are deployed outside, and a large amount of data such as temperature, humidity, monitoring, and power are collected. For the search engine of the Internet webpage, the entire Internet is required. All pages are downloaded. Obviously, one machine can't do it. It needs multiple machines to form a network crawler system. Each machine downloads a part and works at the same time to download a large number of web pages in a limited time.

For the transmission of data: a queue in memory will be smashed by a large amount of data, so a distributed queue based on the hard disk is generated, so that the queue can be transmitted simultaneously by multiple machines, as long as your data volume is large, as long as my queue Enough, the pipe is thick enough to hold it.

For the storage of data: the file system of a machine is definitely not put down, so a large distributed file system is needed to do this, and the hard disk of multiple machines is made into a large file system.

For the analysis of data: it may be necessary to decompose, count, and summarize a large amount of data. A machine may not be able to handle it. So there is a distributed computing method, which divides a large amount of data into small parts, each machine processes a small portion, and multiple machines process in parallel, which can be completed quickly. For example, the famous Terasort sorts data of 1 TB, which is equivalent to 1000G. If it is processed by a single machine, it will take several hours, but parallel processing will be completed in 209 seconds.

So what is called big data? To put it bluntly, it’s just that one machine can’t finish, everyone is doing it together. However, as the amount of data grows larger, many small companies need to process quite a lot of data. What can these small companies do without so many machines?

4 big data needs cloud computing, cloud computing needs big data

Having said that, everyone thinks about cloud computing. When you want to do these things, you need a lot of machines to do it. It really depends on when and when you want it.

For example, the financial situation of big data analysis company may be analyzed once a week. If you want to put this hundred machines or one thousand machines in there, it is very wasteful to use it once a week. When you can count, do you take out the thousands of machines; when it’s not, let the thousands of machines do other things?

Who can do this? Only cloud computing can provide resource layer flexibility for big data operations. Cloud computing also deploys big data on its PaaS platform as a very, very important general-purpose application. Because the big data platform can make multiple machines do one thing together, this thing is not something that ordinary people can develop, nor is it that ordinary people can play it. How can they hire dozens of hundreds of people to play this?

So just like a database, you still need a bunch of professional people to play with this stuff. Now there are basically big data solutions on the public cloud. When a small company needs a big data platform, there is no need to purchase a thousand machines. As long as it is on the public cloud, this thousand machines are out, and The big data platform that has been deployed above, just put the data into it and you can do it.

Cloud computing requires big data, big data requires cloud computing, and the two are combined.

Fourth, artificial intelligence embraces big data

1 When can the machine understand the heart?

虽说有了大数据,人的欲望却不能够满足。虽说在大数据平台里面有搜索引擎这个东西,想要什么东西一搜就出来了。但也存在这样的情况:我想要的东西不会搜,表达不出来,搜索出来的又不是我想要的。

例如音乐软件推荐了一首歌,这首歌我没听过,当然不知道名字,也没法搜。但是软件推荐给我,我的确喜欢,这就是搜索做不到的事情。当人们使用这种应用时,会发现机器知道我想要什么,而不是说当我想要时,去机器里面搜索。这个机器真像我的朋友一样懂我,这就有点人工智能的意思了。

人们很早就在想这个事情了。最早的时候,人们想象,要是有一堵墙,墙后面是个机器,我给它说话,它就给我回应。如果我感觉不出它那边是人还是机器,那它就真的是一个人工智能的东西了。

2让机器学会推理

怎么才能做到这一点呢?人们就想:我首先要告诉计算机人类的推理的能力。你看人重要的是什么?人和动物的区别在什么?就是能推理。要是把我这个推理的能力告诉机器,让机器根据你的提问,推理出相应的回答,这样多好?

其实目前人们慢慢地让机器能够做到一些推理了,例如证明数学公式。这是一个非常让人惊喜的一个过程,机器竟然能够证明数学公式。但慢慢又发现其实这个结果也没有那么令人惊喜。因为大家发现了一个问题:数学公式非常严谨,推理过程也非常严谨,而且数学公式很容易拿机器来进行表达,程序也相对容易表达。

然而人类的语言就没这么简单了。比如今天晚上,你和你女朋友约会,你女朋友说:如果你早来,我没来;你等着,如果我早来;你没来,你等着!这个机器就比较难理解了,但人都懂。所以你和女朋友约会,是不敢迟到的。

3教给机器知识

因此,仅仅告诉机器严格的推理是不够的,还要告诉机器一些知识。但告诉机器知识这个事情,一般人可能就做不来了。可能专家可以,比如语言领域的专家或者财经领域的专家。

语言领域和财经领域知识能不能表示成像数学公式一样稍微严格点呢?例如语言专家可能会总结出主谓宾定状补这些语法规则,主语后面一定是谓语,谓语后面一定是宾语,将这些总结出来,并严格表达出来不久行了吗?

后来发现这个不行,太难总结了,语言表达千变万化。就拿主谓宾的例子,很多时候在口语里面就省略了谓语,别人问:你谁啊?我回答:我刘超。但你不能规定在语音语义识别时,要求对着机器说标准的书面语,这样还是不够智能,就像罗永浩在一次演讲中说的那样,每次对着手机,用书面语说:请帮我呼叫某某某,这是一件很尴尬的事情。

人工智能这个阶段叫做专家系统。专家系统不易成功,一方面是知识比较难总结,另一方面总结出来的知识难以教给计算机。因为你自己还迷迷糊糊,觉得似乎有规律,就是说不出来,又怎么能够通过编程教给计算机呢?

4算了,教不会你自己学吧

于是人们想到:机器是和人完全不一样的物种,干脆让机器自己学习好了。

机器怎么学习呢?既然机器的统计能力这么强,基于统计学习,一定能从大量的数字中发现一定的规律。

其实在娱乐圈有很好的一个例子,可见一般:

有一位网友统计了知名歌手在大陆发行的9 张专辑中117 首歌曲的歌词,同一词语在一首歌出现只算一次,形容词、名词和动词的前十名如下表所示(词语后面的数字是出现的次数):

如果我们随便写一串数字,然后按照数位依次在形容词、名词和动词中取出一个词,连在一起会怎么样呢?

例如取圆周率3.1415926,对应的词语是:坚强,路,飞,自由,雨,埋,迷惘。稍微连接和润色一下:

坚强的孩子,

依然前行在路上,

张开翅膀飞向自由,

让雨水埋葬他的迷惘。

是不是有点感觉了?当然,真正基于统计的学习算法比这个简单的统计复杂得多。

然而统计学习比较容易理解简单的相关性:例如一个词和另一个词总是一起出现,两个词应该有关系;而无法表达复杂的相关性。并且统计方法的公式往往非常复杂,为了简化计算,常常做出各种独立性的假设,来降低公式的计算难度,然而现实生活中,具有独立性的事件是相对较少的。

5模拟大脑的工作方式

于是人类开始从机器的世界,反思人类的世界是怎么工作的。

人类的脑子里面不是存储着大量的规则,也不是记录着大量的统计数据,而是通过神经元的触发实现的,每个神经元有从其它神经元的输入,当接收到输入时,会产生一个输出来刺激其它神经元。于是大量的神经元相互反应,最终形成各种输出的结果。

例如当人们看到美女瞳孔会放大,绝不是大脑根据身材比例进行规则判断,也不是将人生中看过的所有的美女都统计一遍,而是神经元从视网膜触发到大脑再回到瞳孔。在这个过程中,其实很难总结出每个神经元对最终的结果起到了哪些作用,反正就是起作用了。

于是人们开始用一个数学单元模拟神经元。

这个神经元有输入,有输出,输入和输出之间通过一个公式来表示,输入根据重要程度不同(权重),影响着输出。

于是将n个神经元通过像一张神经网络一样连接在一起。n这个数字可以很大很大,所有的神经元可以分成很多列,每一列很多个排列起来。每个神经元对于输入的权重可以都不相同,从而每个神经元的公式也不相同。当人们从这张网络中输入一个东西的时候,希望输出一个对人类来讲正确的结果。

例如上面的例子,输入一个写着2的图片,输出的列表里面第二个数字最大,其实从机器来讲,它既不知道输入的这个图片写的是2,也不知道输出的这一系列数字的意义,没关系,人知道意义就可以了。正如对于神经元来说,他们既不知道视网膜看到的是美女,也不知道瞳孔放大是为了看的清楚,反正看到美女,瞳孔放大了,就可以了。

对于任何一张神经网络,谁也不敢保证输入是2,输出一定是第二个数字最大,要保证这个结果,需要训练和学习。毕竟看到美女而瞳孔放大也是人类很多年进化的结果。学习的过程就是,输入大量的图片,如果结果不是想要的结果,则进行调整。

如何调整呢?就是每个神经元的每个权重都向目标进行微调,由于神经元和权重实在是太多了,所以整张网络产生的结果很难表现出非此即彼的结果,而是向着结果微微地进步,最终能够达到目标结果。

当然,这些调整的策略还是非常有技巧的,需要算法的高手来仔细的调整。正如人类见到美女,瞳孔一开始没有放大到能看清楚,于是美女跟别人跑了,下次学习的结果是瞳孔放大一点点,而不是放大鼻孔。

6没道理但做得到

听起来也没有那么有道理,但的确能做到,就是这么任性!

神经网络的普遍性定理是这样说的,假设某个人给你某种复杂奇特的函数,f(x):

不管这个函数是什么样的,总会确保有个神经网络能够对任何可能的输入x,其值f(x)(或者某个能够准确的近似)是神经网络的输出。

如果在函数代表着规律,也意味着这个规律无论多么奇妙,多么不能理解,都是能通过大量的神经元,通过大量权重的调整,表示出来的。

7人工智能的经济学解释

这让我想到了经济学,于是比较容易理解了。

我们把每个神经元当成社会中从事经济活动的个体。于是神经网络相当于整个经济社会,每个神经元对于社会的输入,都有权重的调整,做出相应的输出,比如工资涨了、菜价涨了、股票跌了,我应该怎么办、怎么花自己的钱。这里面没有规律么?肯定有,但是具体什么规律呢?很难说清楚。

基于专家系统的经济属于计划经济。整个经济规律的表示不希望通过每个经济个体的独立决策表现出来,而是希望通过专家的高屋建瓴和远见卓识总结出来。但专家永远不可能知道哪个城市的哪个街道缺少一个卖甜豆腐脑的。

于是专家说应该产多少钢铁、产多少馒头,往往距离人民生活的真正需求有较大的差距,就算整个计划书写个几百页,也无法表达隐藏在人民生活中的小规律。

基于统计的宏观调控就靠谱多了,每年统计局都会统计整个社会的就业率、通胀率、GDP等指标。这些指标往往代表着很多内在规律,虽然不能精确表达,但是相对靠谱。

然而基于统计的规律总结表达相对比较粗糙。比如经济学家看到这些统计数据,可以总结出长期来看房价是涨还是跌、股票长期来看是涨还是跌。例如,如果经济总体上扬,房价和股票应该都是涨的。但基于统计数据,无法总结出股票,物价的微小波动规律。

基于神经网络的微观经济学才是对整个经济规律最最准确的表达,每个人对于自己在社会中的输入进行各自的调整,并且调整同样会作为输入反馈到社会中。想象一下股市行情细微的波动曲线,正是每个独立的个体各自不断交易的结果,没有统一的规律可循。

而每个人根据整个社会的输入进行独立决策,当某些因素经过多次训练,也会形成宏观上统计性的规律,这也就是宏观经济学所能看到的。例如每次货币大量发行,最后房价都会上涨,多次训练后,人们也就都学会了。

8人工智能需要大数据

然而,神经网络包含这么多的节点,每个节点又包含非常多的参数,整个参数量实在是太大了,需要的计算量实在太大。但没有关系,我们有大数据平台,可以汇聚多台机器的力量一起来计算,就能在有限的时间内得到想要的结果。

人工智能可以做的事情非常多,例如可以鉴别垃圾邮件、鉴别黄色暴力文字和图片等。这也是经历了三个阶段的:

第一个阶段依赖于关键词黑白名单和过滤技术,包含哪些词就是黄色或者暴力的文字。随着这个网络语言越来越多,词也不断地变化,不断地更新这个词库就有点顾不过来。

第二个阶段时,基于一些新的算法,比如说贝叶斯过滤等,你不用管贝叶斯算法是什么,但是这个名字你应该听过,这个一个基于概率的算法。

第三个阶段就是基于大数据和人工智能,进行更加精准的用户画像和文本理解和图像理解。

由于人工智能算法多是依赖于大量的数据的,这些数据往往需要面向某个特定的领域(例如电商,邮箱)进行长期的积累,如果没有数据,就算有人工智能算法也白搭,所以人工智能程序很少像前面的IaaS和PaaS一样,将人工智能程序给某个客户安装一套,让客户去用。因为给某个客户单独安装一套,客户没有相关的数据做训练,结果往往是很差的。

但云计算厂商往往是积累了大量数据的,于是就在云计算厂商里面安装一套,暴露一个服务接口,比如您想鉴别一个文本是不是涉及黄色和暴力,直接用这个在线服务就可以了。这种形势的服务,在云计算里面称为软件即服务,SaaS (Software AS A Service)

于是工智能程序作为SaaS平台进入了云计算。

五、基于三者关系的美好生活

终于云计算的三兄弟凑齐了,分别是IaaS、PaaS和SaaS。所以一般在一个云计算平台上,云、大数据、人工智能都能找得到。一个大数据公司,积累了大量的数据,会使用一些人工智能的算法提供一些服务;一个人工智能公司,也不可能没有大数据平台支撑。

所以,当云计算、大数据、人工智能这样整合起来,便完成了相遇、相识、相知的过程。

Stainless Steel Hexagonal Bar

Stainless Steel Hexagonal Bar,420 Stainless Steel Hexagonal Bar,Stainless Steel Bar Metal Rod,Stainless Steel Bar Top

ShenZhen Haofa Metal Precision Parts Technology Co., Ltd. , https://www.haofametals.com