15Dec2021

Learning spark fast data processing spark download pdf

Search Engine. Spark solves similar problems as Hadoop MapReduce does but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and inbuilt tools for interactive query analysis Shark , large-scale graph processing and analysis Bagel , and real-time analysis Spark Streaming , it can be interactively used to quickly process and query big data sets.

Fast Data Processing with Spark covers how to write distributed map reduce style programs with Spark. The book will guide you through every step required to write effective distributed programs from setting up your cluster and interactively exploring the API, to deploying your job to the cluster, and tuning it for your purposes.

Fast Data Processing with Spark covers everything from setting up your Spark cluster in a variety of situations stand-alone, EC2, and so on , to how to use the interactive shell to write distributed code interactively. From there, we move on to cover how to write and deploy distributed jobs in Java, Scala, and Python. We then examine how to use the interactive shell to quickly prototype distributed programs and explore the Spark API. What you will learn from this book Prototype distributed applications with Spark's interactive shell Learn different ways to interact with Spark's distributed representation of data RDDs Load data from the various data sources Query Spark with a SQL-like query syntax Integrate Shark queries with Spark programs Effectively test your distributed software Tune a Spark installation Install and set up Spark on your cluster Work effectively with large data sets Approach This book will be a basic, step-by-step tutorial, which will help readers take advantage of all that Spark has to offer.

On the generality side, Spark is designed to cover a wide range of workloads that previously required separate distributed systems, including batch applications, iterative algorithms, interactive queries, and streaming.

By supporting these workloads in the same engine, Spark makes it easy and inexpensive to combine different processing types, which is often necessary in production data analysis pipelines.

In addition, it reduces the management burden of maintaining separate tools. It also integrates closely with other Big Data tools. In particular, Spark can run in Hadoop clusters and access any Hadoop data source, including Cassandra.

A Unified Stack The Spark project contains multiple closely integrated components. Because the core engine of Spark is both fast and general-purpose, it powers multiple higher-level components specialized for various workloads, such as SQL or machine learning. These components are designed to interoperate closely, letting you combine them like libraries in a software project.

A philosophy of tight integration has several benefits. First, all libraries and higher-level components in the stack benefit from improvements at the lower layers. Second, the costs associated with running the stack are minimized, because instead of running 5—10 independent software systems, an organization needs to run only one.

These costs include deployment, maintenance, testing, support, and others. This also means that each time a new component is added to the Spark stack, every organization that uses Spark will immediately be able to try this new component. This changes the cost of trying out a new type of data analysis from downloading, deploying, and learning a new software project to upgrading Spark.

Finally, one of the largest advantages of tight integration is the ability to build applications that seamlessly combine different processing models. For example, in Spark you can write one application that uses machine learning to classify data in real time as it is ingested from streaming sources. Simultaneously, analysts can query the resulting data, also in real time, via SQL e. In addition, more sophisticated data engineers and data scientists can access the same data via the Python shell for ad hoc analysis.

Others might access the data in standalone batch applications. All the while, the IT team has to maintain only one system. Figure RDDs represent a collection of items distributed across many compute nodes that can be manipulated in parallel. Spark Core provides many APIs for building and manipulating these collections. This tight integration with the rich computing environment provided by Spark makes Spark SQL unlike any other open source data warehouse tool.

Spark SQL was added to Spark in version 1. Spark Streaming Spark Streaming is a Spark component that enables processing of live streams of data. Examples of data streams include logfiles generated by production web servers, or queues of messages containing status updates posted by users of a web service. Underneath its API, Spark Streaming was designed to provide the same degree of fault tolerance, throughput, and scalability as Spark Core.

MLlib provides multiple types of machine learning algorithms, including classification, regression, clustering, and collaborative filtering, as well as supporting functionality such as model evaluation and data import. It also provides some lower-level ML primitives, including a generic gradient descent optimization algorithm. All of these methods are designed to scale out across a cluster. GraphX GraphX is a library for manipulating graphs e. GraphX also provides various operators for manipulating graphs e.

Cluster Managers Under the hood, Spark is designed to efficiently scale up from one to many thousands of compute nodes. To achieve this while maximizing flexibility, Spark can run over a variety of cluster managers, including Hadoop YARN, Apache Mesos, and a simple cluster manager included in Spark itself called the Standalone Scheduler.

Chapter 7 explores the different options and how to choose the correct cluster manager. Who Uses Spark, and for What? Because Spark is a general-purpose framework for cluster computing, it is used for a diverse range of applications.

In the Preface we outlined two groups of readers that this book targets: data scientists and engineers. Unsurprisingly, the typical use cases differ between the two, but we can roughly classify them into two categories, data science and data applications. Nonetheless, it can be illuminating to consider the two groups and their respective use cases separately. Data Science Tasks Data science, a discipline that has been emerging over the past few years, centers on analyzing data.

While there is no standard definition, for our purposes a data scientist is somebody whose main task is to analyze and model data. Data scientists may have experience with SQL, statistics, predictive modeling machine learning , and programming, usually in Python, Matlab, or R. Data scientists also have experience with techniques necessary to transform data into formats that can be analyzed for insights sometimes referred to as data wrangling. Data scientists use their skills to analyze data with the goal of answering a question or discovering insights.

Oftentimes, their workflow involves ad hoc analysis, so they use interactive shells versus building complex applications that let them see results of queries and snippets of code in the least amount of time. Spark supports the different tasks of data science with a number of components.

The Spark shell makes it easy to do interactive data analysis using Python or Scala. Machine learning and data analysis is supported through the MLLib libraries. In addition, there is support for calling out to external programs in Matlab or R.

Spark enables data scientists to tackle problems with larger data sizes than they could before with tools like R or Pandas. For example, the initial investigation of a data scientist might lead to the creation of a production recommender system that is integrated into a web application and used to generate product suggestions to users.

Often it is a different person or team that leads the process of productizing the work of the data scientists, and that person is often an engineer. Data Processing Applications The other main use case of Spark can be described in the context of the engineer persona.

For our purposes here, we think of engineers as a large class of software developers who use Spark to build production data processing applications. These developers usually have an understanding of the principles of software engineering, such as encapsulation, interface design, and object-oriented programming. They frequently have a degree in computer science. They use their engineering skills to design and build software systems that implement a business use case.

For engineers, Spark provides a simple way to parallelize these applications across clusters, and hides the complexity of distributed systems programming, network communication, and fault tolerance. The system gives them enough control to monitor, inspect, and tune applications while allowing them to implement common tasks quickly.

The modular nature of the API based on passing distributed collections of objects makes it easy to factor work into reusable libraries and test it locally. A Brief History of Spark Spark is an open source project that has been built and is maintained by a thriving and diverse community of developers.

If you or your organization are trying Spark for the first time, you might be interested in the history of the project. The researchers in the lab had previously been working on Hadoop MapReduce, and observed that MapReduce was inefficient for iterative and interactive computing jobs. Thus, from the beginning, Spark was designed to be fast for interactive queries and iterative algorithms, bringing in ideas like support for in-memory storage and efficient fault recovery.

In a very short time, however, many external organizations began using Spark, and today, over 50 organizations list themselves on the Spark PoweredBy page, and dozens speak about their use cases at Spark community events such as Spark Meetups and the Spark Summit.

Spark was first open sourced in March , and was transferred to the Apache Software Foundation in June , where it is now a top-level project. Spark Versions and Releases Since its creation, Spark has been a very active project and community, with the number of contributors growing with each release.

Spark 1. Though the level of activity has rapidly grown, the community continues to release updated versions of Spark on a regular schedule. This book focuses primarily on Spark 1. We will look at interacting with these data sources in Chapter 5. Chapter 2. Downloading Spark and Getting Started In this chapter we will walk through the process of downloading and running Spark in local mode on a single computer. This chapter was written for anybody who is new to Spark, including both data scientists and engineers.

Spark can be used from Python, Java, or Scala. We will include examples in all languages wherever possible. To run Spark on either your laptop or a cluster, all you need is an installation of Java 6 or newer.

Spark does not yet work with Python 3. Downloading Spark The first step to using Spark is to download and unpack it. Tip Windows users may run into issues installing Spark into a directory with a space in the name. Instead, install Spark in a directory with no space e. If your operating system does not have the tar command installed, try searching the Internet for a free TAR extractor — for example, on Windows, you may wish to try 7-Zip.

To do that, open a terminal, change to the directory where you downloaded Spark, and untar the file. This will create a new directory with the same name but without the final. The ls command lists the contents of the Spark directory. We will start by running some of the examples that come with Spark. Then we will write, compile, and run a simple Spark job of our own. All of the work we will do in this chapter will be with Spark running in local mode; that is, nondistributed mode, which uses only a single machine.

Spark can run in a variety of different modes, or environments. We will cover the various deployment modes in detail in Chapter 7. Because Spark can load data into memory on the worker nodes, many distributed computations, even ones that process terabytes of data across dozens of machines, can run in a few seconds.

This makes the sort of iterative, ad hoc, and exploratory analysis commonly done in shells a good fit for Spark. Spark provides both Python and Scala shells that have been augmented to support connecting to a cluster. Because a shell is very useful for learning the API, we recommend using one of these languages for these examples even if you are a Java developer.

The API is similar in every language. When the shell starts, you will notice a lot of log messages. You may need to press Enter once to clear the log output and get to a shell prompt. Figure shows what the PySpark shell looks like when you open it. The PySpark shell with default logging output You may find the logging statements that get printed in the shell distracting.

You can control the verbosity of the logging. To do this, you can create a file in the conf directory called log4j. The Spark developers already include a template for this file called log4j. These collections are called resilient distributed datasets, or RDDs. Example You can access the Spark UI there and see all sorts of information about your tasks and cluster.

In Examples and , the variable called lines is an RDD, created here from a text file on our local machine. We can run various parallel operations on the RDD, such as counting the number of elements in the dataset here, lines of text in the file or printing the first one. At a high level, every Spark application consists of a driver program that launches various parallel operations on a cluster.

In the preceding examples, the driver program was the Spark shell itself, and you could just type in the operations you wanted to run. Driver programs access Spark through a SparkContext object, which represents a connection to a computing cluster. In the shell, a SparkContext is automatically created for you as the variable called sc. Try printing out sc to see its type, as shown in Example In Examples and , we called sc.

We can then run various operations on these lines, such as count. To run these operations, driver programs typically manage a number of nodes called executors. For example, if we were running the count operation on a cluster, different machines might count lines in different ranges of the file. Because we just ran the Spark shell locally, it executed all its work on a single machine — but you can connect the same shell to a cluster to analyze data in parallel.

Figure shows how Spark executes on a cluster. When using Spark in these languages, you can also define a function separately and then pass its name to Spark.

While we will cover the Spark API in more detail later, a lot of its magic is that function- based operations like filter also parallelize across the cluster. That is, Spark automatically takes your function e.

Thus, you can write code in a single driver program and automatically have parts of it run on multiple nodes. Standalone Applications The final piece missing in this quick tour of Spark is how to use it in standalone programs. Apart from running interactively, Spark can be linked into standalone applications in either Java, Scala, or Python. The main difference from using it in the shell is that you need to initialize your own SparkContext.

After that, the API is the same. The process of linking to Spark varies by language. In Java and Scala, you give your application a Maven dependency on the spark-core artifact. As of the time of writing, the latest Spark version is 1.

Popular integrated development environments like Eclipse also allow you to directly add a Maven dependency to a project. The spark-submit script includes the Spark dependencies for us in Python. Simply run your script with the line given in Example Initializing a SparkContext Once you have linked an application to Spark, you need to import the Spark packages in your program and create a SparkContext.

You do so by first creating a SparkConf object to configure your application, and then building a SparkContext for it. Examples through demonstrate this in each supported language. Initializing Spark in Scala import org. SparkConf import org. SparkContext import org. Initializing Spark in Java import org. SparkConf; import org. An application name, namely My App in these examples.

Additional parameters exist for configuring how your application executes or adding code to be shipped to the cluster, but we will cover these in later chapters of the book.

After you have initialized a SparkContext, you can use all the methods we showed before to create RDDs e. Finally, to shut down Spark, you can either call the stop method on your SparkContext, or simply exit the application e. This quick overview should be enough to let you run a standalone Spark application on your laptop.

For more advanced configuration, Chapter 7 will cover how to connect your application to a cluster, including packaging your application so that its code is automatically shipped to worker nodes. For now, please refer to the Quick Start Guide in the official Spark documentation. On a single machine, implementing word count is simple, but in distributed frameworks it is a common example because it involves reading and combining data from many worker nodes. We will look at building and packaging a simple word count example with both sbt and Maven.

This is covered in more detail in Chapter 7. The spark-submit script sets up a number of environment variables used by Spark. From the mini-complete-example directory we can build in both Scala Example and Java Example Chapter 7 covers packaging Spark applications in more detail. Conclusion In this chapter, we have covered downloading Spark, running it locally on your laptop, and using it either interactively or from a standalone application.

We gave a quick overview of the core concepts involved in programming with Spark: a driver program creates a SparkContext and RDDs, and then runs parallel operations on them. In the next chapter, we will dive more deeply into how RDDs operate. Chapter 3. An RDD is simply a distributed collection of elements.

Under the hood, Spark automatically distributes the data contained in RDDs across your cluster and parallelizes the operations you perform on them. Both data scientists and engineers should read this chapter, as RDDs are the core concept in Spark. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster.

Users create RDDs in two ways: by loading an external dataset, or by distributing a collection of objects e. Transformations construct a new RDD from a previous one.

For example, one common transformation is filtering data that matches a predicate. In our text file example, we can use this to create a new RDD holding just the strings that contain the word Python, as shown in Example One example of an action we called earlier is first , which returns the first element in an RDD and is demonstrated in Example Although you can define new RDDs any time, Spark computes them only in a lazy fashion — that is, the first time they are used in an action.

This approach might seem unusual at first, but makes a lot of sense when you are working with Big Data. For instance, consider Example and Example , where we defined a text file and then filtered the lines that include Python. Instead, once Spark sees the whole chain of transformations, it can compute just the data needed for its result. We can ask Spark to persist our data in a number of different places, which will be covered in Table After computing it the first time, Spark will store the RDD contents in memory partitioned across the machines in your cluster , and reuse them in future actions.

Persisting RDDs on disk instead of memory is also possible. For example, if we knew that we wanted to compute multiple results about the README lines that contain Python, we could write the script shown in Example Create some input RDDs from external data.

Transform them to define new RDDs using transformations like filter. Ask Spark to persist any intermediate RDDs that will need to be reused. Launch actions such as count and first to kick off a parallel computation, which is then optimized and executed by Spark. Tip cache is the same as calling persist with the default storage level.

Creating RDDs Spark provides two ways to create RDDs: loading an external dataset and parallelizing a collection in your driver program. This approach is very useful when you are learning Spark, since you can quickly create your own RDDs in the shell and perform operations on them.

Keep in mind, however, that outside of prototyping and testing, this is not widely used since it requires that you have your entire dataset in memory on one machine. Loading external datasets is covered in detail in Chapter 5. Actions are operations that return a result to the driver program or write it to storage, and kick off a computation, such as count and first. Spark treats transformations and actions very differently, so understanding which type of operation you are performing will be important.

If you are ever confused whether a given function is a transformation or an action, you can look at its return type: transformations return RDDs, whereas actions return some other data type. Many transformations are element-wise; that is, they work on one element at a time; but this is not true for all transformations.

As an example, suppose that we have a logfile, log. We can use the filter transformation seen before. Instead, it returns a pointer to an entirely new RDD. We show Python in Example , but the union function is identical in all three languages.

Transformations can actually operate on any number of input RDDs. Tip A better way to accomplish the same result as in Example would be to simply filter the inputRDD once, looking for either error or warning. Finally, as you derive new RDDs from each other using transformations, Spark keeps track of the set of dependencies between different RDDs, called the lineage graph. Figure shows a lineage graph for Example Actions are the second type of RDD operation.

They are the operations that return a final value to the driver program or write data to an external storage system. Actions force the evaluation of the transformations required for the RDD they were called on, since they need to actually produce output. Continuing the log example from the previous section, we might want to print out some information about the badLinesRDD. Java error count using actions System. We then iterate over them locally to print out information at the driver.

We will cover the different options for exporting data in Chapter 5. Lazy Evaluation As you read earlier, transformations on RDDs are lazily evaluated, meaning that Spark will not begin to execute until it sees an action. This can be somewhat counterintuitive for new users, but may be familiar for those who have used functional languages such as Haskell or LINQ-like data processing frameworks.

Lazy evaluation means that when we call a transformation on an RDD for instance, calling map , the operation is not immediately performed. Instead, Spark internally records metadata to indicate that this operation has been requested. Rather than thinking of an RDD as containing specific data, it is best to think of each RDD as consisting of instructions on how to compute the data that we build up through transformations.

Loading data into an RDD is lazily evaluated in the same way transformations are. So, when we call sc. As with transformations, the operation in this case, reading the data can occur multiple times. Tip Although transformations are lazy, you can force Spark to execute them at any time by running an action, such as count. This is an easy way to test out just part of your program.

Spark uses lazy evaluation to reduce the number of passes it has to take over our data by grouping operations together. In systems like Hadoop MapReduce, developers often have to spend a lot of time considering how to group together operations to minimize the number of MapReduce passes. In Spark, there is no substantial benefit to writing a single complex map instead of chaining together many simple operations. Thus, users are free to organize their program into smaller, more manageable operations.

Each of the core languages has a slightly different mechanism for passing functions to Spark. Python In Python, we have three options for passing functions into Spark. For shorter functions, we can pass in lambda expressions, as we did in Example , and as Example demonstrates.

Alternatively, we can pass in top-level functions, or locally defined functions. When you pass a function that is the member of an object, or contains references to fields in an object e.

Furthermore, as in Python, passing a method or field of an object includes a reference to that whole object, though this is less obvious because we are not forced to write these references with self. As we did with Python in Example , we can instead extract the fields we need as local variables and avoid needing to pass the whole object containing them, as shown in Example Note that passing in local serializable variables or functions that are members of a top-level object is always safe.

There are a number of different interfaces based on the return type of the function. Table We can either define our function classes inline as anonymous inner classes Example 3- 22 , or create a named class Example One other benefit of top-level functions is that you can give them constructor parameters, as shown in Example Since Java 8 is still relatively new as of this writing, our examples use the more verbose syntax for defining classes in previous versions of Java.

However, with lambda expressions, our search example would look like Example Tip Both anonymous inner classes and lambda expressions can reference any final variables in the method enclosing them, so you can pass these variables to Spark just as in Python and Scala. Common Transformations and Actions In this chapter, we tour the most common transformations and actions in Spark.

We cover converting between RDD types and these special operations in later sections. Element-wise transformations The two most common transformations you will likely be using are map and filter see Figure The map transformation takes in a function and applies it to each element in the RDD with the result of the function being the new value of each element in the resulting RDD.

The filter transformation takes in a function and returns an RDD that only has elements that pass the filter function. The operation to do this is called flatMap. As with map , the function we provide to flatMap is called individually for each element in our input RDD.

Instead of returning a single element, we return an iterator with our return values. Rather than producing an RDD of iterators, we get back an RDD that consists of the elements from all of the iterators. A simple usage of flatMap is splitting up an input string into words, as shown in Examples through Difference between flatMap and map on an RDD Pseudo set operations RDDs support many of the operations of mathematical sets, such as union and intersection, even when the RDDs themselves are not properly sets.

Four operations are shown in Figure Some simple set operations The set property most frequently missing from our RDDs is the uniqueness of elements, as we often have duplicates.

If we want only unique elements we can use the RDD. Note that distinct is expensive, however, as it requires shuffling all the data over the network to ensure that we receive only one copy of each element. Shuffling, and how to avoid it, is discussed in more detail in Chapter 4. The simplest set operation is union other , which gives back an RDD consisting of the data from both sources.

This can be useful in a number of use cases, such as processing logfiles from many sources. Spark also provides an intersection other method, which returns only elements in both RDDs.

While intersection and union are two similar concepts, the performance of intersection is much worse since it requires a shuffle over the network to identify common elements. Sometimes we need to remove some data from consideration. Like intersection , it performs a shuffle. The cartesian other transformation returns all possible pairs of a, b where a is in the source RDD and b is in the other RDD.

We can also take the Cartesian product of an RDD with itself, which can be useful for tasks like user similarity. Be warned, however, that the Cartesian product is very expensive for large RDDs. Often used to extract words. Return an RDD consisting of only elements that rdd. Remove the contents of one RDD e. With reduce , we can easily sum the elements of our RDD, count the number of elements, and perform other types of aggregations see Examples through The zero value you provide should be the identity element for your operation; that is, applying it multiple times with your function should not change the value e.

Tip You can minimize object creation in fold by modifying and returning the first of the two parameters in place. However, you should not modify the second parameter. Both fold and reduce require that the return type of our result be the same type as that of the elements in the RDD we are operating over.

For example, when computing a running average, we need to keep track of both the count so far and the number of elements, which requires us to return a pair. We could work around this by first using map where we transform every element into the element and the number 1, which is the type we want to return, so that the reduce function can work on pairs.

The aggregate function frees us from the constraint of having the return be the same type as the RDD we are working on. With aggregate , like fold , we supply an initial zero value of the type we want to return. We then supply a function to combine the elements from our RDD with the accumulator.

Finally, we need to supply a second function to merge two accumulators, given that each node accumulates its own results locally. We can use aggregate to compute the average of an RDD, avoiding a map before the fold , as shown in Examples through If there is an ordering defined on our data, we can also extract the top elements from an RDD using top. Sometimes we need a sample of our data in our driver program.

The takeSample withReplacement, num, seed function allows us to take a sample of our data either with or without replacement. Sometimes it is useful to perform an action on all of the elements in the RDD, but without returning any result to the driver program. A good example of this would be posting JSON to a webserver or inserting records into a database.

In either case, the foreach action lets us perform computations on each element in the RDD without bringing it back locally. The further standard operations on a basic RDD all behave pretty much exactly as you would imagine from their name. Table summarizes these and other actions. Number of elements in count rdd. Return num elements take num rdd.

Return the top num top num rdd. Return num elements takeOrdered num ordering based on provided rdd. Combine the elements reduce func of the RDD together in rdd. Same as reduce but fold zero func with the provided zero rdd. Similar to reduce but rdd. Implicits, while quite powerful, can sometimes be confusing. When searching for functions on your RDD in Scaladoc, make sure to look at functions that are available in these wrapper classes.

This has the benefit of giving you a greater understanding of what exactly is going on, but can be a bit more cumbersome. To construct RDDs of these special types, instead of always using the Function class we will need to use specialized versions. Table shows the specialized functions and their uses. When we want a DoubleRDD back, instead of calling map , we need to call mapToDouble with the same pattern all of the other functions follow.

This gives us access to the additional DoubleRDD specific functions like mean and variance. In Python all of the functions are implemented on the base RDD class but will fail at runtime if the type of data in the RDD is incorrect. This can be especially expensive for iterative algorithms, which look at the data many times.

Another trivial example would be doing a count and then writing out the same RDD, as shown in Example If a node that has data persisted on it fails, Spark will recompute the lost partitions of the data when needed. We can also replicate our data on multiple nodes if we want to be able to handle node failure without slowdown.

Spark has many levels of persistence to choose from based on what our goals are, as you can see in Table In Python, we always serialize the data that persist stores, so the default is instead stored in the JVM heap as pickled objects.

When we write data out to disk or off-heap storage, that data is also always serialized. Persistence levels from org. StorageLevel and pyspark. Stores serialized representation in memory. If you attempt to cache too much data to fit in memory, Spark will automatically evict old partitions using a Least Recently Used LRU cache policy.

For the memory-only storage levels, it will recompute these partitions the next time they are accessed, while for the memory-and-disk ones, it will write them out to disk. However, caching unnecessary data can lead to eviction of useful data and more recomputation time. Finally, RDDs come with a method called unpersist that lets you manually remove them from the cache. After that, we discuss input and output from a variety of data sources, and more advanced topics in working with SparkContext.

Chapter 4. We also discuss an advanced feature that lets users control the layout of pair RDDs across nodes: partitioning. Using controllable partitioning, applications can sometimes greatly reduce communication costs by ensuring that data will be accessed together and will be on the same node.

Fast Data Processing with Spark. Spark solves similar problems as Hadoop MapReduce does but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and inbuilt tools for interactive query analysis Shark , large-scale graph processing and analysis Bagel , and real-time analysis Spark Streaming , it can be interactively used to quickly process and query big data sets.

Fast Data Processing with Spar Apache Spark is emerging as one of the most popular technologies for performing analytics on huge datasets, and this practical guide shows you how to harness Spark';s power for approaching a variety of analytics problems.

You';ll learn how to apply common techniques, such as classification, clustering, collaborative filtering, anomaly detection, dimensionality reduction, and Monte Carlo simulation to fields such as genomics, security, and finance.

Oliver Mosley's Ownd

0コメント

1000 / 1000