Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore BigData

BigData

Published by 16_Julie K, 2023-04-14 08:28:07

Description: BigData

Search

Read the Text Version

Big Data

Big Data: Introduction • Big data can be identified by the huge volume, diverse types and different sizes of data which ranges from few terabytes to zettabytes. • The integration of AI, ML and IoT has contributed to the increased complexity of data and resulted in the evolution of new forms and sources of data. • According to the national institute of standards and technology, big data will represent the data for which acquisition speed, data volume or data characterisation restricts the capacity of using conventional associated methods to managed successfully analysis.

Big Data: Data processing • The processing of big data is important and quite complex because the data is mostly unstructured. • The processing of big data includes checking for null or empty data points, cleansing of invalid records, categorising unstructured data into logical categories. • The processing stage is the most important step since it can help revealing the underlying patterns of data.

Big Data: Main categories • Big data comes mainly in three categories: • Structured data: Structured data is mostly present in an organized for mat, is easy to analyze, and is explored without any effort, using basic algorithms. Structured data are categorized as quantitative data and are recognized using machine language. These types of data are stored in relational database management system (RDBMS) and are managed by Structured Query Language (SQL) Typically, these data types are generated by humans or machines. • Semi-structured data: Semi-structured data is more flexible compared to structured data, but less so when compared to unstructured data. Semi structured data are not generally available in the relational databases. However, these data have certain organizational characteristics that assist in the analysis of data. The scaling of data in the semi-structured format. is easier compared to the data in the structured format. • Unstructured data: Unstructured data comprises data, which cannot be easily explored applying basic algorithms. These data are categorized as qualitative data. Human-generated unstructured data may be in the form of emails, video files, audio files, social media posts, and images. In contrast, machine- generated unstructured data includes satellite images, scientific data, digital surveillance data, and sensor data.

Big Data: Main characteristics • The data generated from industrial internet of things (IIoT) systems are colossal in amount, vary in type and nature as well as formats. These characteristics are the following: • Volume: After the metamorphosis of the traditional industries to Industry 4.0 and IIoT compliant, heterogeneous types of sensor nodes attached to the machines and various devices are quite common. These devices generate a huge amount of data. The size of data may vary with time and range from a few kilobytes to zettabytes. • Variety: As the amount and type of data generation characteristics are random; therefore, the organization of this huge volume and diverse form of data is quite difficult. Variety refers to the data format structured, semi-structured, or unstructured in which the input data is generated. Furthermore, intelligent devices can generate data in any of these formats.

Big Data: Main characteristics Cont. • Velocity: This characteristic signifies the speed at which the data is generated. With the advancement in technologies, the data processing speeds of processors have increased. However, the overall performance of systems have reduced significantly due to huge data volumes being handled at each time instant, especially as big data requires the handling of data in real-time The frequency of generation and handling the data may vary. • Visualisation: The big data generated can be visualized in the form of pictures and graphics. Based on this visualization, decision-makers can formulate new strategies and have a complete picture of the industrial landscape of their choice Any new pattern in the data format can be identified using this characteristic of big data. Therefore, the appropriate utilization of the data generated indicates the pattern, trend, and correlation in the data.

Big Data: Main characteristics Cont. • Value: The value of big data indicates the process of analysis and ex traction of meaningful information from the generated data. The process of computing the value of data is simple. Further, the value extracted assists in the evaluation of the investigation process and helps the concerned personnel to make a decision. • Variability: The variability characteristic of data signifies the variation in the meaning of data. However, data variability is different from data variety. The randomness feature in the generation of data indicates the variability characteristics. Therefore, the process of handling data be comes difficult due to inconsistency in the data. For example, language processing, hashtags, geo-spatial data, and multimedia sensor data.

Big Data: Main sources • Big data can be generated from various devices across various industries. The data sources are categorised as either internal or external. Below is a comparison of both sources. Internal External The frequency of data generated from the sensors Typically, a huge amount of data is generated from attached to different interconnected IoT devices different social media platforms such as Twitter, serving various industrial applications such as Facebook, and LinkedIn, and weather forecasts. This industrial (manufacturing and non-manufacturing type of data produced may be quantitative or industries), agriculture, and healthcare data, vary qualitative. with time.

Big Data: Acquisition and storage • The acquisition procedure of data involves collecting, transmitting and preprocessing of data. The data acquired is processed, analysed and stored in data warehouses. • Big data acquisition is made through message streaming protocols where the data is collected from various resources, stored and analysed. • In the early stags of data acquisition, data is accumulated from multiple resources and stored temporarily in a database or a file. The data is then categorised into intradata and interdata.

Big Data: Acquisition and storage Cont. • Intradata and Interdata are defined as the following: Intradata Interdata Data that collected and interconnect Data connecting between multiple data information internally within a specific centers. data center.

Big Data: Acquisition and storage Cont. • There are three basic components of big data acquisition: • Different protocols, which are used to accumulate data from various sources and forms and convert them into a meaningful format. For example, the Advanced Message Queuing Protocol (AMQP). • The frameworks which help in the collection of data from various sources and apply the protocols. • Storage of the data accumulated by the frameworks.

Big Data: Acquisition and storage Cont. • Given the rapid development of social media platforms, the amount and the type of data produced that range from sensors to online transactions have increased. • The method applied to collect, process, analyse and extract meaningful information from data depends on the characteristics of the data. • Distributed file systems which store such data are known for their consistency, fault-tolerance and accessibility. • Common tools to store data include Hadoop, Google File System (GFS) and Flumes.

Big Data: Necessity of Analytics • The introduction of IR 4.0 and IIoT, heterogeneous types of sensor nodes are being attached to the machines / devices. These smart sensors transmit data to a server using wireless technologies. • The collected data is categorised, stored, analysed and the removal of redundant records is performed. • Since the data comes from multiple sources, data analysis is required to process and classify them accordingly.

Big Data: Necessity of Analytics Cont. • Regardless of the domain or sector, data analysis is always going to be required to convert huge amount of data to meaningful information. • Data analysis is a crucial step for businesses to make decisions to alter their existing product or introduce new ones. • Data analysis could detect the future occurrence of faults and help in improving the safety of workers as well as enhancing the overall efficiency in the manufacturing and process industries.

Big Data: Examples of data analytics • In the medical field, real-time data of patients can help predicting heath condition with the use of additional tools such as machine learning. • In the field of transportation, the data collected from sensors and other devices could help in preventing accidents and reduce traffic congestions • Logistic companies such as DHL or FedEx, uses data analytics to figure out the best shipping routes, approximate delivery times, and also can track the real-time status of goods that are dispatched using GPS trackers. Data Analytics has made online shopping easier and more demandable.

Big Data: Data warehouses • Data warehouses act as the central repository of data collection, information retrieval and analysis. • Data warehouses are capable of storing and collecting data from one or many resources within a specific period. However, data warehouses are different from big data analytics. • As a massive amount of data flows real time, a parallelly performing and scalable framework is necessary for big data analysis • Cloud-based analytics are getting more utilised in handling big amount of data.

Big Data: Data warehouses Cont. • The NIST standards list down the essential features of a cloud-based analytics. These features are the following: • On-demand self-service: The services are remotely provided to the customers/end-users on a payment basis, as per their requirement. The customers remain unaware of the back-end process of how the services are provided to them and require minimum interaction with the service provider. • Network access: The services are available over the network and are easily accessed by consumers through standard mechanisms. • Grouping: Based on the demand of consumers/end-users, these services are grouped Further, the resources are dynamically provided to the end-users, on request.

Big Data: Data warehouses Cont. • Flexibility: The services requested by consumers are almost equivalently provided to them in appropriate quantity and time. However, these services may appear to be immense to consumers. • Measured services: The type of service and amount of resources used at any level can be optimized and controlled with proper clarity by both the service provider and consumers.

Big Data: Types of analytics • Data analytics is defined as the combination of various processes such as information extraction, transformation and inferring from the data collected to make a decision. • Various types of data analytics are the following: • Descriptive analytics: The analysis of what happened in the event is per formed by applying descriptive analytics. This analysis utilizes historical data to analyze and provide insights into events. For example, the sales of a company can be predicted from the preference of customers and the sales cycle. • Diagnostic analytics: Diagnostic analytics infer from historical data and find dependencies. The detailed analysis of what is likely to happen, and the pattern can be identified from the data. For example, what are the probable reasons for any particular disease, a patient is suffering can be predicted from the past data. Further, the future sales and profit of a company can be estimated from the previous sales data.

Big Data: Types of analytics Cont. • Predictive analytics: Predictive analytics help to predict the probability of what will happen in the future. The results of the descriptive and diagnostic analytics are used to predict future trends, the correlation between the events, and detect similar clusters. For example, analysis of sales source, lead time, number and type of communications, and social media documents, may help to forecast and improve the sales of a company. • Prescriptive analytics: The measures undertaken to eradicate issues predicted and finding the correlation among them to prove/nullify a hypothesis is enabled using prescriptive analytics. Both the historical data, as well as other information, are combined for enabling prescriptive analysis. For example, identification and clustering of patients with similar LDL. cholesterol, and high- pressure levels.

Big Data: Real-life examples • Netflix: Netflix has over 150 million subscribers and collects data on all of them. They track what people watch, when they watch it, the device being used, if a show is paused, and how quickly a user finishes watching a series. They even take screenshots of scenes that people watch twice. By feeding all this information into their algorithms, Netflix can create custom user profiles. These allow them to tailor the experience by recommending movies and TV shows with impressive accuracy. And while you might have seen articles about how Netflix likes to splash the cash on new shows, this isn’t done blindly—all the data they collect helps them decide what to commission next. • Amazon: Amazon collects vast amounts of data on its users. They track what users buy, how often (and for how long) they stay online, and even things like product reviews (useful for sentiment analysis). Amazon can even guess people’s income based on their billing address. By compiling all this data across millions of users, Amazon can create highly-specialized segmented user profiles. Using predictive analytics, they can then target their marketing based on users’ browsing habits. This is used for suggesting what you might want to buy next, but also for things like grouping products together to streamline the shopping experience.

Big Data: Real-life examples Cont. • McDonald’s: McDonald’s, who use big data to shape key aspects of their offering offline, too. This includes their mobile app, drive-thru experience, and digital menus. With its own app, McDonald’s collects vital information about user habits. This lets them offer tailored loyalty rewards to encourage repeat business. But they also collect data from each restaurant’s drive-thru, allowing them to ensure enough staff is on shift to cover demand. Finally, their digital menus offer different options depending on factors such as the time of day, if any events are taking place nearby, and even the weather. • Twitter: Unlike other social platforms, almost every user’s tweets are completely public and pullable. This is a huge plus if you’re trying to get a large amount of data to run analytics on. Twitter data is also pretty specific. Twitter’s API allows you to do complex queries like pulling every tweet about a certain topic within the last twenty minutes, or pull a certain user’s non-retweeted tweets.

Big data applications

Big data tools


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook