My Data Science Blogs

August 17, 2019

If you did not already know

SUbgraph Robust REpresentAtion Learning (SURREAL) google
The success of graph embeddings or node representation learning in a variety of downstream tasks, such as node classification, link prediction, and recommendation systems, has led to their popularity in recent years. Representation learning algorithms aim to preserve local and global network structure by identifying node neighborhood notions. However, many existing algorithms generate embeddings that fail to properly preserve the network structure, or lead to unstable representations due to random processes (e.g., random walks to generate context) and, thus, cannot generate to multi-graph problems. In this paper, we propose a robust graph embedding using connection subgraphs algorithm, entitled: SURREAL, a novel, stable graph embedding algorithmic framework. SURREAL learns graph representations using connection subgraphs by employing the analogy of graphs with electrical circuits. It preserves both local and global connectivity patterns, and addresses the issue of high-degree nodes. Further, it exploits the strength of weak ties and meta-data that have been neglected by baselines. The experiments show that SURREAL outperforms state-of-the-art algorithms by up to 36.85% on multi-label classification problem. Further, in contrast to baselines, SURREAL, being deterministic, is completely stable. …

KAMILA Clustering (KAMILA) google
KAMILA clustering, a novel method for clustering mixed-type data in the spirit of k-means clustering. It does not require dummy coding of variables, and is efficient enough to scale to rather large data sets. …

Shake-Shake Regularization google
The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://…/shake-shake.
Review: Shake-Shake Regularization (Image Classification)

Multi-Layer Fast ISTA (ML-FISTA) google
Parsimonious representations in data modeling are ubiquitous and central for processing information. Motivated by the recent Multi-Layer Convolutional Sparse Coding (ML-CSC) model, we herein generalize the traditional Basis Pursuit regression problem to a multi-layer setting, introducing similar sparse enforcing penalties at different representation layers in a symbiotic relation between synthesis and analysis sparse priors. We propose and analyze different iterative algorithms to solve this new problem in practice. We prove that the presented multi-layer Iterative Soft Thresholding (ML-ISTA) and multi-layer Fast ISTA (ML-FISTA) converge to the global optimum of our multi-layer formulation at a rate of $\mathcal{O}(1/k)$ and $\mathcal{O}(1/k^2)$, respectively. We further show how these algorithms effectively implement particular recurrent neural networks that generalize feed-forward architectures without any increase in the number of parameters. We demonstrate the different architectures resulting from unfolding the iterations of the proposed multi-layer pursuit algorithms, providing a principled way to construct deep recurrent CNNs from feed-forward ones. We demonstrate the emerging constructions by training them in an end-to-end manner, consistently improving the performance of classical networks without introducing extra filters or parameters. …

Continue Reading…


Read More

Fresh from the Python Package Index

Tool for automating excel actions on Windows

Simple logger for Machine Learning experiments

Common utility functions for data engineering usecases

Discrete Hidden Markov Models with Numba

Python bindings for Lithuanian language synthesizer from LIEPA project


various matrix operations

A fully formatted dictionary for everything you’ll ever need.

Neural network modeled after the olfactory system of the hawkmoth

A high-resolution multiple linear regression algorithm used to analyze PV output with a few inputs

Apache Atlas Python Client

Markdown in Python.

Semantic Vectors work in Python

pipelines: Deploy your machine learning experiments with Skymind Pipelines

A Binary fault injection tool for TensorFlow-based program

A text normalization package

Pytorch Framework For Medical Image Analysis

A helper for working with 1d & 2d array.

Continue Reading…


Read More

Magister Dixit

“… Note that playing back data from Hadoop into SAP Sybase ESP can occur much faster than in real time, …” SAP ( 2013 )

Continue Reading…


Read More

Document worth reading: “Why Machines Cannot Learn Mathematics, Yet”

Nowadays, Machine Learning (ML) is seen as the universal solution to improve the effectiveness of information retrieval (IR) methods. However, while mathematics is a precise and accurate science, it is usually expressed by less accurate and imprecise descriptions, contributing to the relative dearth of machine learning applications for IR in this domain. Generally, mathematical documents communicate their knowledge with an ambiguous, context-dependent, and non-formal language. Given recent advances in ML, it seems canonical to apply ML techniques to represent and retrieve mathematics semantically. In this work, we apply popular text embedding techniques to the arXiv collection of STEM documents and explore how these are unable to properly understand mathematics from that corpus. In addition, we also investigate the missing aspects that would allow mathematics to be learned by computers. Why Machines Cannot Learn Mathematics, Yet

Continue Reading…


Read More

Modern R with the tidyverse is available on Leanpub

[This article was first published on Econometrics and Free Software, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Yesterday I released an ebook on Leanpub,
called Modern R with the tidyverse, which you can also
read for free here.

In this blog post, I want to give some context.

Modern R with the tidyverse is the second ebook I release on Leanpub. I released the first one, called
Functional programming and unit testing for data munging with R around
Christmas 2016 (I’ve retired it on Leanpub, but you can still read it for free
here) . I just had moved back to my home country of
Luxembourg and started a new job as a research assistant at the statistical national institute.
Since then, lots of things happened; I’ve changed jobs and joined PwC Luxembourg as a data scientist,
was promoted to manager, finished my PhD, and most importantly of all, I became a father.

Through all this, I continued blogging and working on a new ebook, called Modern R with the tidyverse.
At first, this was supposed to be a separate book from the first one, but as I continued writing,
I realized that updating and finishing the first one, would take a lot of effort, and also, that
it wouldn’t make much sense in keeping both separated. So I decided to merge the content from the
first ebook with the second, and update everything in one go.

My very first notes were around 50 pages if memory serves, and I used them to teach R at the
University of Strasbourg while I employed there as a research and teaching assistant and working
on my PhD. These notes were the basis of Functional programming and unit testing for data munging with R
and now Modern R. Chapter 2 of Modern R is almost a simple copy and paste from these notes
(with more sections added). These notes were first written around 2012-2013ish.

Modern R is the kind of text I would like to have had when I first started playing around with R,
sometime around 2009-2010. It starts from the beginning, but also goes quite into details in the
later chapters. For instance, the section on
modeling with functional programming
is quite advanced, but I believe that readers that read through all the book and reached that part
would be armed with all the needed knowledge to follow. At least, this is my hope.

Now, the book is still not finished. Two chapters are missing, but it should not take me long to
finish them as I already have drafts lying around. However, exercises might still be in wrong
places, and more are required. Also, generally, more polishing is needed.

As written in the first paragraph of this section, the book is available on
Leanpub. Unlike my previous ebook, this one costs money;
a minimum price of 4.99$ and a recommended price of 14.99$, but as mentioned you can read it for
free online. I’ve hesitated to give it a minimum price of
0$, but I figured that since the book can be read for free online, and that Leanpub has a 45 days
return policy where readers can get 100% reimbursed, no questions asked (and keep the downloaded
ebook), readers were not taking a lot of risks by buying it for 5 bucks. I sure hope however that
readers will find that this ebook is worth at least 5 bucks!

Now why should you read it? There’s already a lot of books on learning how to use R. Well, I don’t
really want to convince you to read it. But some people do seem to like my style of writing and my
blog posts, so I guess these same people, or similar people, might like the ebook. Also, I think
that this ebook covers a lot of different topics, enough of them to make you an efficient R user.
But as I’ve written in the introduction of Modern R:

So what you can expect from this book is that this book is not the only one you should read.

Anyways, hope you’ll enjoy Modern R, suggestions, criticisms and reviews welcome!

By the way, the cover of the book is a painting by John William Waterhouse,depicting Diogenes of Sinope,
an ancient Greek philosopher, and absolute mad lad. Read his Wikipedia page, it’s worth it.

Hope you enjoyed! If you found this blog post useful, you might want to follow
me on twitter for blog post updates and
buy me an espresso or, or buy my ebook on Leanpub

Buy me an EspressoBuy me an Espresso

To leave a comment for the author, please follow the link and comment on their blog: Econometrics and Free Software. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

August 16, 2019

Distilled News

Address Limitation of RNN in NLP Problems by Using Transformer-XL

Recurrent Neural Network (RNN) offers a way to learn a sequence of inputs. The drawback is that it is difficult to optimize due to vanishing gradient problem. Transformer (Al-Rfou et al., 2018) is introduced to overcome the limitation of RNN. By design, a fixed-length segment is defined to reduce resource consumption. However, there is another problem that calls context fragmentation. If the input sequence is larger than pre-defined segment length, the input sequence needs to be separated and information cannot be captured across segments. Transformer-XL is introduced to overcome this limitation by Dai et al. (2019)

Machine Learning: Create Expert Systems

achine Learning is nothing but creating the machines or software which can take its own decisions on the basis of previous data collected. Technically speaking, machine learning involves ‘explicit’ programming rather than an ‘implicit’ one:
Machine learning is divided into three categories viz.
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning.

Processing a Slowly Changing Dimension Type 2 Using PySpark in AWS

With the emergence of new technologies that make data processing lightening fast, and cloud ecosystems which allow for flexibility, cost savings, security, and convenience, there appear to be some data modeling philosophies that are being used less frequently. One of these approaches is the star schema data architecture.

Create predictive models in R with Caret

Caret is the short for Classification And REgression Training. It is a complete package that covers all the stages of a pipeline for creating a machine learning predictive model. In this tutorial, I will explain the following topics:
• How to install caret
• How to create a simple model
• How to use cross-validation to avoid overfitting
• How to add simple preprocessing to your data
• How to find the best parameters for your chosen model
• How to see the most important features/variables for your model
• How to use your model to predict

From Research to Production: Containerized Training Jobs

This article demonstrates how to containerize training code using Docker and deploy the fitted model as a web app. Although it partly builds on my previous post Transformer Fine-Tuning for Sentiment Analysis, bear in mind that the method described here is generic; it can be adopted as a standard solution for Machine Learning practitioners to encapsulate their research code and facilitate reproducibility.

A New Way to Share & Collaborate on Jupyter Notebooks

Sharing and reproducing data science is still not easy – most work just sits on Github, undiscovered, as very technical documents. Most companies still do not have a central hub for knowledge. Data teams typically use tools like Github for project management, but this means their work is not shared with the non-technical people in the company. This also holds true for individual projects. Kyso solves this problem with an elegant blogging platform, making accessible to everyone information that was previously only possessed by a select few. Kyso lets you blog and share your analyses. Think of it like Medium, but for data science – you can publish Jupyter notebooks, charts, code, datasets and even write articles. Any code is hidden by default and can be toggled so that your post is readable for both technical and non-technical audiences. This guide is about sharing the results of your Jupyter notebooks with people who aren’t comfortable with code. You will learn how to integrate your Github repositories with Kyso to create an effortless computation-to-dissemination workflow.

Introducing mlrPlayground

The idea behind this project was to offer a platform in the form of a Shiny web application, in which a user can try out different kinds of learners provided by the mlr package. On a small set of distinct and tunable regression and classification tasks, it is possible to observe the prediction/performance behavior based on changes on the task, the learner or the learner’s hyperparameters. The user is able to gain new insights and a deeper understanding of how a learner performs, where it’s advantages are and in which cases the learner might fail. There are a lot of different settings we want to offer in the user interface, and so – to not remove the fun of our playground – a huge effort went into creating an aesthetically pleasing and user-friendly UI. To achieve this, a website template was downloaded from Templated and used as the baseline design. After extending the template with missing CSS classes, most of the used shiny widgets have been overwritten – or even completely replaced -, offering refreshingly new visuals for the well-known shiny framework. For the smooth feel of the app, an object-oriented R6 class system with reactive attributes was engineered for the backend to give a well-defined framework of which elements should trigger what evaluation; an otherwise extremely tiresome and error-prone task for dozens of different UI elements. After all ‘mlrPlayground’ may not be as fun as a real playground, but you are also not as likely to hurt yourself and it is definitely more entertaining than looking at boring pictures of learners in a book.

Fraud detection using Benford’s Law (Python Code)

Have you ever noticed what chance has a given number to start with the digit 1?Does the digit 1 has the same propability to be a leading digit as 9? To let you know that the leading numbers of a number represents its non zero left most digits. For example 29 and 0.037 are 2 and 3.Well the unswer in the previous question is no…According to Benford’s law, a.k.a. the first digit law, the frequency of occurrence of the leading digits in naturally occurring numerical distributions is predictable and nonuniform but more close to a power law distribution. In fact, a given number is six times more likely to start with a 1 than a 9! This is very illogical, as most people would expect a uniform distribution U(1,9) where all the digits have the same likelihood to show up in first slot so they expext a propability 1/9 percent ~ 11,1%. Let us consider that Pr(D1=d) is the propability of a given number has first digit d and and the Pr(D2=d) the first two digits, the following table provide all the decemical digits probs that emerged by Newcomb’s observation in 1881.

A 2019 Guide to Semantic Segmentation

Semantic segmentation refers to the process of linking each pixel in an image to a class label. These labels could include a person, car, flower, piece of furniture, etc., just to mention a few. We can think of semantic segmentation as image classification at a pixel level. For example, in an image that has many cars, segmentation will label all the objects as car objects. However, a separate class of models known as instance segmentation is able to label the separate instances where an object appears in an image. This kind of segmentation can be very useful in applications that are used to count the number of objects, such as counting the amount of foot traffic in a mall. Some of its primary applications are in autonomous vehicles, human-computer interaction, robotics, and photo editing/creativity tools. For example, semantic segmentation is very crucial in self-driving cars and robotics because it is important for the models to understand the context in the environment in which they’re operating.

Proximal Policy Optimization Tutorial (Part 1: Actor-Critic Method)

Welcome to the first part of a math and code turorial series. I’ll be showing how to implement a Reinforcement Learning algorithm known as Proximal Policy Optimization (PPO) for teaching an AI agent how to play football/soccer. By the end of this tutorial, you’ll get an idea on how to apply an on-policy learning method in an actor-critic framework in order to learn navigating any game environment. We shall see what these terms mean in context of the PPO algorithm and also implement them in Python with the help of Keras. So, let’s first start with the installation of our game environment.

The ultimate guide to A/B testing. Part 2: Data distributions

A/B testing is a very popular technique for checking granular changes in a product without mistakenly taking into account changes that were caused by outside factors. In this series of articles, I will try to give an easy hands-on manual on how to design, run and estimate results of a/b tests, so you are ready to go and get these amazing statistically significant results!

What I have Learned After Building A Successful AI PoC

I recently completed an AI PoC that has reached production and I wanted to share what I have learned on how to improve the chances of any AI PoC. Only a few companies have started their AI journey. Indeed AI-based solutions are still in the early stages. As a result, decision-makers are often tempted to first rely on a PoC. The cold truth is that a majority of them don’t reach production. To put it simply, the goal of a Proof of Concept is to test whether it’s worth investing time and more money into a technological solution. Needless to say that building an AI PoC is hard because it requires a large set of skills.

4 Tips for Advanced Feature Engineering and Preprocessing

Techniques for creating new features, detecting outliers, handling imbalanced data, and impute missing values. Arguably, two of the most important steps in developing a machine learning model is feature engineering and preprocessing. Feature engineering consists of the creation of features whereas preprocessing involves cleaning the data. Torture the data, and it will confess to anything. – Ronald Coase We often spend a significant amount of time refining the data into something useful for modeling purposes. In order to make this work more efficient, I would like to share 4 Tips and Tricks that could help you with engineering and preprocessing those features. I should note that, how cliche it might be, domain knowledge might be one of the most important things to have when engineering features. It might help you in preventing under- and overfitting by having a better understanding of the features that you use.

Python Risk Management: Kelly Criterion

From the recent events in the financial market correction, I thought it would be a fun time to talk about risk management. Specifically, we’ll go over the Kelly Criterion with a concrete example in Python. First, we’ll discuss a brief overview of the Kelly Criterion. Next, we’ll go over a simple coin flip example. Lastly, we’ll take that simple example and apply it to a financial index.

Continue Reading…


Read More

Bayesian Computation conference in January 2020

X writes to remind us of the Bayesian computation conference:

– BayesComp 2020 occurs on 7-10 January 2020 in Gainesville, Florida, USA
– Registration is open with regular rates till October 14, 2019
– Deadline for submission of poster proposals is December 15, 2019
– Deadline for travel support applications is September 20, 2019
– Sessions are posted on
– There are four free tutorials on January 7, 2020, on Stan, NIMBLE, SAS, and AutoStat

SAS, huh?

Continue Reading…


Read More

Faster threshold queries with cache-sensitive scancount

Suppose that you are given 100 sorted arrays of integers. You can compute their union or their intersection. It is a common setup in data indexing: the integers might be unique identifiers.

But there is more than just intersections and unions… What if you want all values that appear in more than three arrays?

A really good algorithm for this problem is called scancount. It is good because it is simple and usually quite fast.

Suppose that all my integers are in the interval [0, 20M). You start with an array of counters, initialized at zero. You scan all your arrays for each value in the array, you increment the corresponding counter. When you have scanned all arrays, you scan your counters, looking for counter values greater than your threshold (3).

The pseudocode looks like this…

counter <- array of zeros
for array x in set of arrays {
    for value y in array x {
      counter[y] += 1
for i = 0; i < counter.size(); i++ {
  if(counter[i] > threshold)
     output i;

This algorithm is almost entirely bounded by “memory accesses”. Memory-wise if you only have about 100 arrays, you only need 8-bit counters. So I can store all counters in about 20 MB. Sadly, this means that the counters do not fit in processor cache.

Can you make scancount faster without sacrificing too much simplicity?

So far we did not use the fact that our arrays can be sorted. Because they are sorted, then you can solve the problem in “cache-sensitive” or “cache-aware” chunks.

Build a small array of counters, spanning maybe only 256 kB. Process all arrays, as with the naive scancount, but suspend the processing of this array as soon as a value in the array exceeds 262144. This allows you to find all matching values in the interval [0, 262144). Next repeat the problem with the next interval ([262144,524288)), and so forth. In this manner, you will have far fewer expensive cache misses.

I implemented this solution in C++. Here are my results using random arrays, GNU GCC 8 and a Skylake processor. I report the number of CPU cycles per value in the arrays.

naive scancount 37 cycles
cache-sensitive scancount 16 cycles

Further reading: Compressed bitmap indexes: beyond unions and intersections, Software: Practice & Experience 46 (2), 2016.

Continue Reading…


Read More

Whats new on arXiv – Complete List

End-to-End Machine Learning for Experimental Physics: Using Simulated Data to Train a Neural Network for Object Detection in Video Microscopy
How often should I access my online social networks?
Local Score Dependent Model Explanation for Time Dependent Covariates
Meta Reasoning over Knowledge Graphs
Least Squares Approximation for a Distributed System
Serverless Supercomputing: High Performance Function as a Service for Science
Constrained Multi-Objective Optimization for Automated Machine Learning
Architecture and evolution of semantic networks in mathematics texts
Towards Diverse and Accurate Image Captions via Reinforcing Determinantal Point Process
Tensor-Train Parameterization for Ultra Dimensionality Reduction
Reasoning-Driven Question-Answering for Natural Language Understanding
Fog Robotics: A Summary, Challenges and Future Scope
Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation
Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT
ClustCrypt: Privacy-Preserving Clustering of Unstructured Big Data in the Cloud
Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost
Harmonized Multimodal Learning with Gaussian Process Latent Variable Models
Interpretable Encrypted Searchable Neural Networks
Benchmarking the Robustness of Semantic Segmentation Models
FlexNER: A Flexible LSTM-CNN Stack Framework for Named Entity Recognition
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
Deep Generalized Max Pooling
Towards Explainable AI Planning as a Service
AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models
Towards Optimisation of Collaborative Question Answering over Knowledge Graphs
Unconstrained Monotonic Neural Networks
A Tour of Convolutional Networks Guided by Linear Interpreters
PMU Data Feature Considerations for Realistic, Synthetic Data Generation
Continuous Control for High-Dimensional State Spaces: An Interactive Learning Approach
Skill Transfer in Deep Reinforcement Learning under Morphological Heterogeneity
Deep material network with cohesive layers: Multi-stage training and interfacial failure analysis
Disturbance Decoupling and Instantaneous Fault Detection in Boolean Control Networks
Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition
Generalization Error Bounds for Deep Variational Inference
Order plus size of $τ$-critical graphs
A Closed-Form Analytical Solution for Optimal Coordination of Connected and Automated Vehicles
Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real
Efficient Parallel-in-Time Solution of Time-Periodic Problems Using a Multi-Harmonic Coarse Grid Correction
A Deep Evolutionary Approach to Bioinspired Classifier Optimisation for Brain-Machine Interaction
On explicit $L^2$-convergence rate estimate for underdamped Langevin dynamics
On Occupancy Moments and Bloom Filter Efficiency
Domain Adaptive Training BERT for Response Selection
R-miss-tastic: a unified platform for missing values methods and workflows
Meeting QoS of Users in a Edge to Cloud Platform via Optimally Placing Services and Scheduling Tasks
Entertaining and Opinionated but Too Controlling: A Large-Scale User Study of an Open Domain Alexa Prize System
Boolean constraint satisfaction problems for reaction networks
New Invariants for Permutations, Orders and Graphs
SP-NET: One Shot Fingerprint Singular-Point Detector
Rerooting multi-type branching trees: the infinite spine case
Local convergence of random planar graphs
Generative Multi-Functional Meta-Atom and Metasurface Design Networks
Intelligent Reflecting Surface Enhanced MIMO Broadcasting for Simultaneous Wireless Information and Power Transfer
Multilevel and multifidelity uncertainty quantification for cardiovascular hemodynamics
Pair correlation for Dedekind zeta functions of abelian extensions
Invariant Measures for Nonlinear Conservation Laws Driven by Stochastic Forcing
Joint Precoding and Power Control in Small-Cell Networks With Proportional-Rate MISO-BC Backhaul
Cross-Layer Scheduling and Beamforming in Smart Grid Powered Small-Cell Networks
HyperKG: Hyperbolic Knowledge Graph Embeddings for Knowledge Base Completion
Aspect and Opinion Terms Extraction Using Double Embeddings and Attention Mechanism for Indonesian Hotel Reviews
On community structure in complex networks: challenges and opportunities
3-choosable planar graphs with some precolored vertices and no $5^{-}$-cycles normally adjacent to $8^{-}$-cycles
Random walk on a lattice in the presence of obstacles: The short-time transient regime, anomalous diffusion and crowding
An efficient and convergent finite element scheme for Cahn–Hilliard equations with dynamic boundary conditions
FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age
HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID via Image Captioning
A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading
Aggregating Votes with Local Differential Privacy: Usefulness, Soundness vs. Indistinguishability
On the Elementary Affine Lambda-Calculus with and Without Fixed Points
Pointers in Recursion: Exploring the Tropics
Type-two Iteration with Bounded Query Revision
3-D Scene Graph: A Sparse and Semantic Representation of Physical Environments for Intelligent Agents
Generalised Zero-Shot Learning with Domain Classification in a Joint Semantic and Visual Space
Re-Pair In-Place
Fast Cartesian Tree Matching
Low-PAPR Multi-channel OOK Waveform for IEEE 802.11ba Wake-up Radio
Risk-Limiting Tallies
VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering
Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy
Taking a Lesson from Quantum Particles for Statistical Data Privacy
Probabilistic Multimodal Modeling for Human-Robot Interaction Tasks
Large-dimensional Factor Analysis without Moment Constraints
Computational method for probability distribution on recursive relationships in financial applications
Learning Two-View Correspondences and Geometry Using Order-Aware Network
Distributive Mendelsohn triple systems and the Eisenstein integers
Faster Unsupervised Semantic Inpainting: A GAN Based Approach
Thompson Sampling and Approximate Inference
Memory-Based Neighbourhood Embedding for Visual Recognition
Visualizing Image Content to Explain Novel Image Discovery
Robust Translational Force Control of Multi-Rotor UAV for Precise Acceleration Tracking
AdvFaces: Adversarial Face Synthesis
A Combinatorial Analysis Of Higher Order Generalised Geometric Polynomials: A Generalisation Of Barred Preferential Arrangements
Sketched Representations and Orthogonal Planarity of Bounded Treewidth Graphs
Histographs: Graphs in Histopathology
Complexity of universal access structures
Person Re-identification in Aerial Imagery
Galerkin approximation of holomorphic eigenvalue problems: weak T-coercivity and T-compatibility
Computing and Communicating Functions in Disorganized Wireless Networks
Building Temperature Control: A Distributed Escort Dynamical Approach
Fusion of Detected Objects in Text for Visual Question Answering
Generalized Jacobi-Trudi determinants and evaluations of Schur multiple zeta values
Boosting Liver and Lesion Segmentation from CT Scans By Mask Mining
Mean Field Game for Linear Quadratic Stochastic Recursive Systems
Modelling columnarity of pyramidal cells in the human cerebral cortex
Equitable vertex arboricity of $d$-degenerate graphs
Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling
Equitable tree-$O(d)$-coloring of $d$-degenerate graphs
Spin-orbital glass transition in a model frustrated pyrochlore magnet without quenched disorder
Light edges in 1-planar graphs of minimum degree 3
Equitable partition of graphs into induced linear forests
Cyber-Physical Systems Resilience: State of the Art, Research Issues and Future Trends
Algorithms the min-max regret 0-1 Integer Linear Programming Problem with Interval Data
A Reproducible Comparison of RSSI Fingerprinting Localization Methods Using LoRaWAN
Modeling microstructure price dynamics with symmetric Hawkes and diffusion model using ultra-high-frequency stock data
Borrowing of information across patient subgroups in a basket trial based on distributional discrepancy
Segmentation of Multimodal Myocardial Images Using Shape-Transfer GAN
Causal discovery in heavy-tailed models
Shape-Aware Complementary-Task Learning for Multi-Organ Segmentation
Fluctuations of propagation front in catalytic branching walk
Unsupervised Behavior Change Detection in Multidimensional Data Streams for Maritime Traffic Monitoring
D-UNet: a dimension-fusion U shape network for chronic stroke lesion segmentation
Approximating Values of Generalized-Reachability Stochastic Games
WiFi-based Real-time Breathing and Heart Rate Monitoring during Sleep
X-WikiRE: A Large, Multilingual Resource for Relation Extraction asMachine Comprehension
FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension
Minimal Representations of Order Types by Geometric Graphs
New Results on Parameter Estimation via Dynamic Regressor Extension and Mixing: Continuous and Discrete-time Cases
Assessing Workers Perceived Risk During Construction Task Using A Wristband-Type Biosensor
Mastering emergent language: learning to guide in simulated navigation
MemeFaceGenerator: Adversarial Synthesis of Chinese Meme-face from Natural Sentences
Quasi-periodic quantum thermal machines
GreyReID: A Two-stream Deep Framework with RGB-grey Information for Person Re-identification
Directional TSDF: Modeling Surface Orientation for Coherent Meshes
SG-Net: Syntax-Guided Machine Reading Comprehension
Semi-supervised Learning with Adaptive Neighborhood Graph Propagation Network
The sum-of-squares hierarchy on the sphere, and applications in quantum information theory
Aleph: Efficient Atomic Broadcast in Asynchronous Networks with Byzantine Nodes
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding
Randomly coupled differential equations with correlations
Interleaved Multitask Learning for Audio Source Separation with Independent Databases
Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once
On discrete loop signatures and Markov loops topology
Exploring Projective Norm Graphs
Dual Heuristic Dynamic Programing Control of Grid-Connected Synchronverters
Temporal Analysis of Reddit Networks via Role Embeddings
DAPAS : Denoising Autoencoder to Prevent Adversarial attack in Semantic Segmentation
(Learned) Frequency Estimation Algorithms under Zipfian Distribution
Neural Network Predictive Controller for Grid-Connected Virtual Synchronous Generator
Why Should the Q-method be Integrated Into the Design Science Research? A Systematic Mapping Study
On The Evaluation of Machine Translation Systems Trained With Back-Translation
A Survey of Recent Scalability Improvements for Semidefinite Programming with Applications in Machine Learning, Control, and Robotics
Detecting 11K Classes: Large Scale Object Detection without Fine-Grained Bounding Boxes
Accuracy Controlled Structure-Preserving ${\cal H}^2$-Matrix-Matrix Product in Linear Complexity with Change of Cluster Bases
Maximally additively reducible subsets of the integers
Performance Characterization of Canonical Mobility Models in Drone Cellular Networks
Limit Theorems for the Length of the Longest Common Subsequence of Mallows Permutations
The lexical and grammatical sources of neg-raising inferences
Optimizing for Interpretability in Deep Neural Networks with Tree Regularization
On rank estimators in increasing dimensions
Few-Shot Learning with Global Class Representations
Invariants of polynomials mod Frobenius powers
AutoCorrect: Deep Inductive Alignment of Noisy Geometric Annotations
Towards Debiasing Fact Verification Models
The Power of the Weisfeiler-Leman Algorithm to Decompose Graphs

Continue Reading…


Read More

Whats new on arXiv

End-to-End Machine Learning for Experimental Physics: Using Simulated Data to Train a Neural Network for Object Detection in Video Microscopy

We demonstrate a method for training a convolutional neural network with simulated images for usage on real-world experimental data. Modern machine learning methods require large, robust training data sets to generate accurate predictions. Generating these large training sets requires a significant up-front time investment that is often impractical for small-scale applications. Here we demonstrate a `full-stack’ computational solution, where the training data set is generated on-the-fly using a noise injection process to produce simulated data characteristic of the experimental system.

How often should I access my online social networks?

Users of online social networks are faced with a conundrum of trying to be always informed without having enough time or attention budget to do so. The retention of users on online social networks has important implications, encompassing economic, psychological and infrastructure aspects. In this paper, we pose the following question: what is the optimal rate at which users should access a social network? To answer this question, we propose an analytical model to determine the value of an access (VoA) to the social network. In the simple setting considered in this paper, VoA is defined as the chance of a user accessing the network and obtaining new content. Clearly, VoA depends on the rate at which sources generate content and on the filtering imposed by the social network. Then, we pose an optimization problem wherein the utility of users grows with respect to VoA but is penalized by costs incurred to access the network. Using the proposed framework, we provide insights on the optimal access rate. Our results are parameterized using Facebook data, indicating the predictive power of the approach.

Local Score Dependent Model Explanation for Time Dependent Covariates

The use of deep neural networks to make high risk decisions creates a need for global and local explanations so that users and experts have confidence in the modeling algorithms. We introduce a novel technique to find global and local explanations for time series data used in binary classification machine learning systems. We identify the most salient of the original features used by a black box model to distinguish between classes. The explanation can be made on categorical, continuous, and time series data and can be generalized to any binary classification model. The analysis is conducted on time series data to train a long short-term memory deep neural network and uses the time dependent structure of the underlying features in the explanation. The proposed technique attributes weights to features to explain an observations risk of belonging to a class as a multiplicative factor of a base hazard rate. We use a variation of the Cox Proportional Hazards regression, a Generalized Additive Model, to explain the effect of variables upon the probability of an in-class response for a score output from the black box model. The covariates incorporate time dependence structure in the features so the explanation is inclusive of the underlying time series data structure.

Meta Reasoning over Knowledge Graphs

The ability to reason over learned knowledge is an innate ability for humans and humans can easily master new reasoning rules with only a few demonstrations. While most existing studies on knowledge graph (KG) reasoning assume enough training examples, we study the challenging and practical problem of few-shot knowledge graph reasoning under the paradigm of meta-learning. We propose a new meta learning framework that effectively utilizes the task-specific meta information such as local graph neighbors and reasoning paths in KGs. Specifically, we design a meta-encoder that encodes the meta information into task-specific initialization parameters for different tasks. This allows our reasoning module to have diverse starting points when learning to reason over different relations, which is expected to better fit the target task. On two few-shot knowledge base completion benchmarks, we show that the augmented task-specific meta-encoder yields much better initial point than MAML and outperforms several few-shot learning baselines.

Least Squares Approximation for a Distributed System

In this work we develop a distributed least squares approximation (DLSA) method, which is able to solve a large family of regression problems (e.g., linear regression, logistic regression, Cox’s model) on a distributed system. By approximating the local objective function using a local quadratic form, we are able to obtain a combined estimator by taking a weighted average of local estimators. The resulting estimator is proved to be statistically as efficient as the global estimator. In the meanwhile it requires only one round of communication. We further conduct the shrinkage estimation based on the DLSA estimation by using an adaptive Lasso approach. The solution can be easily obtained by using the LARS algorithm on the master node. It is theoretically shown that the resulting estimator enjoys the oracle property and is selection consistent by using a newly designed distributed Bayesian Information Criterion (DBIC). The finite sample performance as well as the computational efficiency are further illustrated by extensive numerical study and an airline dataset. The airline dataset is 52GB in memory size. The entire methodology has been implemented by Python for a de-facto standard Spark system. By using the proposed DLSA algorithm on the Spark system, it takes 26 minutes to obtain a logistic regression estimator whereas a full likelihood algorithm takes 15 hours to reaches an inferior result.

Serverless Supercomputing: High Performance Function as a Service for Science

Growing data volumes and velocities are driving exciting new methods across the sciences in which data analytics and machine learning are increasingly intertwined with research. These new methods require new approaches for scientific computing in which computation is mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), or be offloaded to specialized accelerators. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most efficient resources. To address these needs we propose funcX—a high-performance function-as-a-service (FaaS) platform that enables intuitive, flexible, efficient, scalable, and performant remote function execution on existing infrastructure including clouds, clusters, and supercomputers. It allows users to register and then execute Python functions without regard for the physical resource location, scheduler architecture, or virtualization technology on which the function is executed—an approach we refer to as ‘serverless supercomputing.’ We motivate the need for funcX in science, describe our prototype implementation, and demonstrate, via experiments on two supercomputers, that funcX can process millions of functions across more than 65000 concurrent workers. We also outline five scientific scenarios in which funcX has been deployed and highlight the benefits of funcX in these scenarios.

Constrained Multi-Objective Optimization for Automated Machine Learning

Automated machine learning has gained a lot of attention recently. Building and selecting the right machine learning models is often a multi-objective optimization problem. General purpose machine learning software that simultaneously supports multiple objectives and constraints is scant, though the potential benefits are great. In this work, we present a framework called Autotune that effectively handles multiple objectives and constraints that arise in machine learning problems. Autotune is built on a suite of derivative-free optimization methods, and utilizes multi-level parallelism in a distributed computing environment for automatically training, scoring, and selecting good models. Incorporation of multiple objectives and constraints in the model exploration and selection process provides the flexibility needed to satisfy trade-offs necessary in practical machine learning applications. Experimental results from standard multi-objective optimization benchmark problems show that Autotune is very efficient in capturing Pareto fronts. These benchmark results also show how adding constraints can guide the search to more promising regions of the solution space, ultimately producing more desirable Pareto fronts. Results from two real-world case studies demonstrate the effectiveness of the constrained multi-objective optimization capability offered by Autotune.

Architecture and evolution of semantic networks in mathematics texts

Knowledge is a network of interconnected concepts. Yet, precisely how the topological structure of knowledge constrains its acquisition remains unknown, hampering the development of learning enhancement strategies. Here we study the topological structure of semantic networks reflecting mathematical concepts and their relations in college-level linear algebra texts. We hypothesize that these networks will exhibit structural order, reflecting the logical sequence of topics that ensures accessibility. We find that the networks exhibit strong core-periphery architecture, where a dense core of concepts presented early is complemented with a sparse periphery presented evenly throughout the exposition; the latter is composed of many small modules each reflecting more narrow domains. Using tools from applied topology, we find that the expositional evolution of the semantic networks produces and subsequently fills knowledge gaps, and that the density of these gaps tracks negatively with community ratings of each textbook. Broadly, our study lays the groundwork for future efforts developing optimal design principles for textbook exposition and teaching in a classroom setting.

Towards Diverse and Accurate Image Captions via Reinforcing Determinantal Point Process

Although significant progress has been made in the field of automatic image captioning, it is still a challenging task. Previous works normally pay much attention to improving the quality of the generated captions but ignore the diversity of captions. In this paper, we combine determinantal point process (DPP) and reinforcement learning (RL) and propose a novel reinforcing DPP (R-DPP) approach to generate a set of captions with high quality and diversity for an image. We show that R-DPP performs better on accuracy and diversity than using noise as a control signal (GANs, VAEs). Moreover, R-DPP is able to preserve the modes of the learned distribution. Hence, beam search algorithm can be applied to generate a single accurate caption, which performs better than other RL-based models.

Tensor-Train Parameterization for Ultra Dimensionality Reduction

Locality preserving projections (LPP) are a classical dimensionality reduction method based on data graph information. However, LPP is still responsive to extreme outliers. LPP aiming for vectorial data may undermine data structural information when it is applied to multidimensional data. Besides, it assumes the dimension of data to be smaller than the number of instances, which is not suitable for high-dimensional data. For high-dimensional data analysis, the tensor-train decomposition is proved to be able to efficiently and effectively capture the spatial relations. Thus, we propose a tensor-train parameterization for ultra dimensionality reduction (TTPUDR) in which the traditional LPP mapping is tensorized in terms of tensor-trains and the LPP objective is replaced with the Frobenius norm to increase the robustness of the model. The manifold optimization technique is utilized to solve the new model. The performance of TTPUDR is assessed on classification problems and TTPUDR significantly outperforms the past methods and the several state-of-the-art methods.

Reasoning-Driven Question-Answering for Natural Language Understanding

Natural language understanding (NLU) of text is a fundamental challenge in AI, and it has received significant attention throughout the history of NLP research. This primary goal has been studied under different tasks, such as Question Answering (QA) and Textual Entailment (TE). In this thesis, we investigate the NLU problem through the QA task and focus on the aspects that make it a challenge for the current state-of-the-art technology. This thesis is organized into three main parts: In the first part, we explore multiple formalisms to improve existing machine comprehension systems. We propose a formulation for abductive reasoning in natural language and show its effectiveness, especially in domains with limited training data. Additionally, to help reasoning systems cope with irrelevant or redundant information, we create a supervised approach to learn and detect the essential terms in questions. In the second part, we propose two new challenge datasets. In particular, we create two datasets of natural language questions where (i) the first one requires reasoning over multiple sentences; (ii) the second one requires temporal common sense reasoning. We hope that the two proposed datasets will motivate the field to address more complex problems. In the final part, we present the first formal framework for multi-step reasoning algorithms, in the presence of a few important properties of language use, such as incompleteness, ambiguity, etc. We apply this framework to prove fundamental limitations for reasoning algorithms. These theoretical results provide extra intuition into the existing empirical evidence in the field.

Fog Robotics: A Summary, Challenges and Future Scope

Human-robot interaction plays a crucial role to make robots closer to humans. Usually, robots are limited by their own capabilities. Therefore, they utilise Cloud Robotics to enhance their dexterity. Its ability includes the sharing of information such as maps, images and the processing power. This whole process involves distributing data which intend to rise enormously. New issues can arise such as bandwidth, network congestion at backhaul and fronthaul systems resulting in high latency. Thus, it can make an impact on seamless connectivity between the robots, users and the cloud. Also, a robot may not accomplish its goal successfully within a stipulated time. As a consequence, Cloud Robotics cannot be in a position to handle the traffic imposed by robots. On the contrary, impending Fog Robotics can act as a solution by solving major problems of Cloud Robotics. Therefore to check its feasibility, we discuss the need and architectures of Fog Robotics in this paper. To evaluate the architectures, we used a realistic scenario of Fog Robotics by comparing them with Cloud Robotics. Next, latency is chosen as the primary factor for validating the effectiveness of the system. Besides, we utilised real-time latency using Pepper robot, Fog robot server and the Cloud server. Experimental results show that Fog Robotics reduces latency significantly compared to Cloud Robotics. Moreover, advantages, challenges and future scope of the Fog Robotics system is further discussed.

Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation

Natural question generation (QG) is a challenging yet rewarding task, that aims to generate questions given an input passage and a target answer. Previous works on QG, however, either (i) ignore the rich structure information hidden in the word sequence, (ii) fail to fully exploit the target answer, or (iii) solely rely on cross-entropy loss that leads to issues like exposure bias and evaluation discrepancy between training and testing. To address the above limitations, in this paper, we propose a reinforcement learning (RL) based graph-to-sequence (Graph2Seq) architecture for the QG task. Our model consists of a Graph2Seq generator where a novel bidirectional graph neural network (GNN) based encoder is applied to embed the input passage incorporating the answer information via a simple yet effective Deep Alignment Network, and an evaluator where a mixed objective function combining both cross-entropy loss and RL loss is designed for ensuring the generation of semantically and syntactically valid text. The proposed model is end-to-end trainable, and achieves new state-of-the-art scores and outperforms all previous methods by a great margin on the SQuAD benchmark.

Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT

This paper presents new state-of-the-art models for three tasks, part-of-speech tagging, syntactic parsing, and semantic parsing, using the cutting-edge contextualized embedding framework known as BERT. For each task, we first replicate and simplify the current state-of-the-art approach to enhance its model efficiency. We then evaluate our simplified approaches on those three tasks using token embeddings generated by BERT. 12 datasets in both English and Chinese are used for our experiments. The BERT models outperform the previously best-performing models by 2.5% on average (7.5% for the most significant case). Moreover, an in-depth analysis on the impact of BERT embeddings is provided using self-attention, which helps understanding in this rich yet representation. All models and source codes are available in public so that researchers can improve upon and utilize them to establish strong baselines for the next decade.

ClustCrypt: Privacy-Preserving Clustering of Unstructured Big Data in the Cloud

Security and confidentiality of big data stored in the cloud are important concerns for many organizations to adopt cloud services. One common approach to address the concerns is client-side encryption where data is encrypted on the client machine before being stored in the cloud. Having encrypted data in the cloud, however, limits the ability of data clustering, which is a crucial part of many data analytics applications, such as search systems. To overcome the limitation, in this paper, we present an approach named ClustCrypt for efficient topic-based clustering of encrypted unstructured big data in the cloud. ClustCrypt dynamically estimates the optimal number of clusters based on the statistical characteristics of encrypted data. It also provides clustering approach for encrypted data. We deploy ClustCrypt within the context of a secure cloud-based semantic search system (S3BD). Experimental results obtained from evaluating ClustCrypt on three datasets demonstrate on average 60% improvement on clusters’ coherency. ClustCrypt also decreases the search-time overhead by up to 78% and increases the accuracy of search results by up to 35%

Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

Several clustering frameworks with interactive (semi-supervised) queries have been studied in the past. Recently, clustering with same-cluster queries has become popular. An algorithm in this setting has access to an oracle with full knowledge of an optimal clustering, and the algorithm can ask the oracle queries of the form, ‘Does the optimal clustering put vertices u and v in the same cluster?’ Due to its simplicity, this querying model can easily be implemented in real crowd-sourcing platforms and has attracted a lot of recent work. In this paper, we study the popular correlation clustering problem (Bansal et al., 2002) under this framework. Given a complete graph G=(V,E) with positive and negative edge labels, correlation clustering objective aims to compute a graph clustering that minimizes the total number of disagreements, that is the negative intra-cluster edges and positive inter-cluster edges. Let C_{OPT} be the number of disagreements made by the optimal clustering. We present algorithms for correlation clustering whose error and query bounds are parameterized by C_{OPT} rather than by the number of clusters. Indeed, a good clustering must have small C_{OPT}. Specifically, we present an efficient algorithm that recovers an exact optimal clustering using at most 2C_{OPT} queries and an efficient algorithm that outputs a 2-approximation using at most C_{OPT} queries. In addition, we show under a plausible complexity assumption, there does not exist any polynomial time algorithm that has an approximation ratio better than 1+\alpha for an absolute constant \alpha >0 with o(C_{OPT}) queries. We extensively evaluate our methods on several synthetic and real-world datasets using real crowd-sourced oracles. Moreover, we compare our approach against several known correlation clustering algorithms.

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

Multimodal learning aims to discover the relationship between multiple modalities. It has become an important research topic due to extensive multimodal applications such as cross-modal retrieval. This paper attempts to address the modality heterogeneity problem based on Gaussian process latent variable models (GPLVMs) to represent multimodal data in a common space. Previous multimodal GPLVM extensions generally adopt individual learning schemes on latent representations and kernel hyperparameters, which ignore their intrinsic relationship. To exploit strong complementarity among different modalities and GPLVM components, we develop a novel learning scheme called Harmonization, where latent model parameters are jointly learned from each other. Beyond the correlation fitting or intra-modal structure preservation paradigms widely used in existing studies, the harmonization is derived in a model-driven manner to encourage the agreement between modality-specific GP kernels and the similarity of latent representations. We present a range of multimodal learning models by incorporating the harmonization mechanism into several representative GPLVM-based approaches. Experimental results on four benchmark datasets show that the proposed models outperform the strong baselines for cross-modal retrieval tasks, and that the harmonized multimodal learning method is superior in discovering semantically consistent latent representation.

Interpretable Encrypted Searchable Neural Networks

In cloud security, traditional searchable encryption (SE) requires high computation and communication overhead for dynamic search and update. The clever combination of machine learning (ML) and SE may be a new way to solve this problem. This paper proposes interpretable encrypted searchable neural networks (IESNN) to explore probabilistic query, balanced index tree construction and automatic weight update in an encrypted cloud environment. In IESNN, probabilistic learning is used to obtain search ranking for searchable index, and probabilistic query is performed based on ciphertext index, which reduces the computational complexity of query significantly. Compared to traditional SE, it is proposed that adversarial learning and automatic weight update in response to user’s timely query of the latest data set without expensive communication overhead. The proposed IESNN performs better than the previous works, bringing the query complexity closer to O(\log N) and introducing low overhead on computation and communication.

Benchmarking the Robustness of Semantic Segmentation Models

When designing a semantic segmentation module for a practical application, such as autonomous driving, it is crucial to understand the robustness of the module with respect to a wide range of image corruptions. While there are recent robustness studies for full-image classification, we are the first to present an exhaustive study for semantic segmentation, based on the state-of-the-art model DeepLabv3+. To increase the realism of our study, we utilize almost 200,000 images generated from Cityscapes and PASCAL VOC 2012, and we furthermore present a realistic noise model, imitating HDR camera noise. Based on the benchmark study we gain several new insights. Firstly, model robustness increases with model performance, in most cases. Secondly, some architecture properties affect robustness significantly, such as a Dense Prediction Cell which was designed to maximize performance on clean data only. Thirdly, to achieve good generalization with respect to various types of image noise, it is recommended to train DeepLabv3+ with our realistic noise model.

FlexNER: A Flexible LSTM-CNN Stack Framework for Named Entity Recognition

Named entity recognition (NER) is a foundational technology for information extraction. This paper presents a flexible NER framework compatible with different languages and domains. Inspired by the idea of distant supervision (DS), this paper enhances the representation by increasing the entity-context diversity without relying on external resources. We choose different layer stacks and sub-network combinations to construct the bilateral networks. This strategy can generally improve model performance on different datasets. We conduct experiments on five languages, such as English, German, Spanish, Dutch and Chinese, and biomedical fields, such as identifying the chemicals and gene/protein terms from scientific works. Experimental results demonstrate the good performance of this framework.

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones. However, due to the discreteness of low-bit quantization, existing quantization methods often face the unstable training process and severe performance degradation. To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks. DSQ can automatically evolve during training to gradually approximate the standard quantization. Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Extensive experiments over several popular network structures show that training low-bit neural networks with DSQ can consistently outperform state-of-the-art quantization methods. Besides, our first efficient implementation for deploying 2 to 4-bit DSQ on devices with ARM architecture achieves up to 1.7\times speed up, compared with the open-source 8-bit high-performance inference framework NCNN. [31]

Deep Generalized Max Pooling

Global pooling layers are an essential part of Convolutional Neural Networks (CNN). They are used to aggregate activations of spatial locations to produce a fixed-size vector in several state-of-the-art CNNs. Global average pooling or global max pooling are commonly used for converting convolutional features of variable size images to a fix-sized embedding. However, both pooling layer types are computed spatially independent: each individual activation map is pooled and thus activations of different locations are pooled together. In contrast, we propose Deep Generalized Max Pooling that balances the contribution of all activations of a spatially coherent region by re-weighting all descriptors so that the impact of frequent and rare ones is equalized. We show that this layer is superior to both average and max pooling on the classification of Latin medieval manuscripts (CLAMM’16, CLAMM’17), as well as writer identification (Historical-WI’17).

Towards Explainable AI Planning as a Service

Explainable AI is an important area of research within which Explainable Planning is an emerging topic. In this paper, we argue that Explainable Planning can be designed as a service — that is, as a wrapper around an existing planning system that utilises the existing planner to assist in answering contrastive questions. We introduce a prototype framework to facilitate this, along with some examples of how a planner can be used to address certain types of contrastive questions. We discuss the main advantages and limitations of such an approach and we identify open questions for Explainable Planning as a service that identify several possible research directions.

AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models

The design of deep graph models still remains to be investigated and the crucial part is how to explore and exploit the knowledge from different hops of neighbors in an efficient way. In this paper, we propose a novel RNN-like deep graph neural network architecture by incorporating AdaBoost into the computation of network; and the proposed graph convolutional network called AdaGCN~(AdaBoosting Graph Convolutional Network) has the ability to efficiently extract knowledge from high-order neighbors and integrate knowledge from different hops of neighbors into the network in an AdaBoost way. We also present the architectural difference between AdaGCN and existing graph convolutional methods to show the benefits of our proposal. Finally, extensive experiments demonstrate the state-of-the-art prediction performance and the computational advantage of our approach AdaGCN.

Towards Optimisation of Collaborative Question Answering over Knowledge Graphs

Collaborative Question Answering (CQA) frameworks for knowledge graphs aim at integrating existing question answering (QA) components for implementing sequences of QA tasks (i.e. QA pipelines). The research community has paid substantial attention to CQAs since they support reusability and scalability of the available components in addition to the flexibility of pipelines. CQA frameworks attempt to build such pipelines automatically by solving two optimisation problems: 1) local collective performance of QA components per QA task and 2) global performance of QA pipelines. In spite offering several advantages over monolithic QA systems, the effectiveness and efficiency of CQA frameworks in answering questions is limited. In this paper, we tackle the problem of local optimisation of CQA frameworks and propose a three fold approach, which applies feature selection techniques with supervised machine learning approaches in order to identify the best performing components efficiently. We have empirically evaluated our approach over existing benchmarks and compared to existing automatic CQA frameworks. The observed results provide evidence that our approach answers a higher number of questions than the state of the art while reducing: i) the number of used features by 50% and ii) the number of components used by 76%.

Unconstrained Monotonic Neural Networks

Monotonic neural networks have recently been proposed as a way to define invertible transformations. These transformations can be combined into powerful autoregressive flows that have been shown to be universal approximators of continuous probability distributions. Architectures that ensure monotonicity typically enforce constraints on weights and activation functions, which enables invertibility but leads to a cap on the expressiveness of the resulting transformations. In this work, we propose the Unconstrained Monotonic Neural Network (UMNN) architecture based on the insight that a function is monotonic as long as its derivative is strictly positive. In particular, this latter condition can be enforced with a free-form neural network whose only constraint is the positiveness of its output. We evaluate our new invertible building block within a new autoregressive flow (UMNN-MAF) and demonstrate its effectiveness on density estimation experiments. We also illustrate the ability of UMNNs to improve variational inference.

A Tour of Convolutional Networks Guided by Linear Interpreters

Convolutional networks are large linear systems divided into layers and connected by non-linear units. These units are the ‘articulations’ that allow the network to adapt to the input. To understand how a network manages to solve a problem we must look at the articulated decisions in entirety. If we could capture the actions of non-linear units for a particular input, we would be able to replay the whole system back and forth as if it was always linear. It would also reveal the actions of non-linearities because the resulting linear system, a Linear Interpreter, depends on the input image. We introduce a hooking layer, called a LinearScope, which allows us to run the network and the linear interpreter in parallel. Its implementation is simple, flexible and efficient. From here we can make many curious inquiries: how do these linear systems look like? When the rows and columns of the transformation matrix are images, how do they look like? What type of basis do these linear transformations rely on? The answers depend on the problems presented, through which we take a tour to some popular architectures used for classification, super-resolution (SR) and image-to-image translation (I2I). For classification we observe that popular networks use a pixel-wise vote per class strategy and heavily rely on bias parameters. For SR and I2I we find that CNNs use wavelet-type basis similar to the human visual system. For I2I we reveal copy-move and template-creation strategies to generate outputs.

PMU Data Feature Considerations for Realistic, Synthetic Data Generation

It is critical that the qualities and features of synthetically-generated, PMU measurements used for grid analysis matches those of measurements obtained from field-based PMUs. This ensures that analysis results generated by researchers during grid studies replicate those outcomes typically expected by engineers in real-life situations. In this paper, essential features associated with industry PMU-derived data measurements are analyzed for input considerations in the generation of vast amounts of synthetic power system data. Inherent variabilities in PMU data as a result of the random dynamics in power system operations, oscillatory contents, and the prevalence of bad data are presented. Statistical results show that in the generation of large datasets of synthetic, grid measurements, an inclusion of different data anomalies, ambient oscillation contents, and random cases of missing data samples due to packet drops helps to improve the realism of experimental data used in power systems analysis.

Continuous Control for High-Dimensional State Spaces: An Interactive Learning Approach

Deep Reinforcement Learning (DRL) has become a powerful methodology to solve complex decision-making problems. However, DRL has several limitations when used in real-world problems (e.g., robotics applications). For instance, long training times are required and cannot be accelerated in contrast to simulated environments, and reward functions may be hard to specify/model and/or to compute. Moreover, the transfer of policies learned in a simulator to the real-world has limitations (reality gap). On the other hand, machine learning methods that rely on the transfer of human knowledge to an agent have shown to be time efficient for obtaining well performing policies and do not require a reward function. In this context, we analyze the use of human corrective feedback during task execution to learn policies with high-dimensional state spaces, by using the D-COACH framework, and we propose new variants of this framework. D-COACH is a Deep Learning based extension of COACH (COrrective Advice Communicated by Humans), where humans are able to shape policies through corrective advice. The enhanced version of D-COACH, which is proposed in this paper, largely reduces the time and effort of a human for training a policy. Experimental results validate the efficiency of the D-COACH framework in three different problems (simulated and with real robots), and show that its enhanced version reduces the human training effort considerably, and makes it feasible to learn policies within periods of time in which a DRL agent do not reach any improvement.

Skill Transfer in Deep Reinforcement Learning under Morphological Heterogeneity

Transfer learning methods for reinforcement learning (RL) domains facilitate the acquisition of new skills using previously acquired knowledge. The vast majority of existing approaches assume that the agents have the same design, e.g. same shape and action spaces. In this paper we address the problem of transferring previously acquired skills amongst morphologically different agents (MDAs). For instance, assuming that a bipedal agent has been trained to move forward, could this skill be transferred on to a one-leg hopper so as to make its training process for the same task more sample efficient? We frame this problem as one of subspace learning whereby we aim to infer latent factors representing the control mechanism that is common between MDAs. We propose a novel paired variational encoder-decoder model, PVED, that disentangles the control of MDAs into shared and agent-specific factors. The shared factors are then leveraged for skill transfer using RL. Theoretically, we study how the performance of PVED depends on its components and agent morphologies. Experimentally, PVED has been extensively validated on four MuJoCo environments. We demonstrate its performance compared to a state-of-the-art approach and several ablation cases, interpret and visualize the hidden factors, and identify avenues for future improvements.

Continue Reading…


Read More

How to Become More Marketable as a Data Scientist

As a data scientist, you are in high demand. So, how can you increase your marketability even more? Check out these current trends in skills most desired by employers in 2019.

Continue Reading…


Read More

Job: Several postdocs, Ground Breaking Deep Learning Technology for Monitoring the Brain during Surgery with Commercialization Opportunity, University of Pittsburgh

** Nuit Blanche is now on Twitter: @NuitBlog **

Kayhan just sent me the following:

Dear Igor,

I hope you are doing well.

I don't know if you remember me but we have been in contact a few times while I was a Ph.D. student at UPenn.

My lab at the University of Pittsburgh has several postdoc positions open. More specifically, I would be thankful if you could advertise this position (Link: to your audience.


Sure Kayhan, I remember! Here is the announcement:
Ground Breaking Deep Learning Technology for Monitoring the Brain during Surgery with Commercialization Opportunity 

We are developing a clinical tool based on deep learning to automatically detect stroke during surgery and alert the surgical team to avert complications and save lives. We are uniquely positioned at the intersection of the largest health care system in the US, the University of Pittsburgh Medical Center (UPMC), and top ranked academic institutions, the University of Pittsburgh (Pitt) and the Carnegie Mellon University (CMU). Our group consists of Pitt and UPMC faculty members who have complementary expertise in machine learning and in healthcare and specifically in deep learning, clinical informatics, neurology, and surgery. We develop novel deep learning and other machine learning methods for application to challenging clinical problems. We are very well funded by NIH, NSF, industry, and internal institutional grants.  
In the current project, we are developing a clinical tool that will automatically detect stroke and other adverse events during surgery from an array of monitoring information, and provide highly accurate real time alerts to the surgical team to make course corrections during surgery. The clinical tool is to be deployed in operating rooms for monitoring surgeries and providing high quality alerts.
The successful candidate will work with us in a highly collaborative environment that spans the computer laboratory and the operating room and will gain unique and valuable experience in deep learning, development of a tool for a clinical setting, and in commercialization.  
Expected qualifications Genuinely motivated to develop and apply machine learning to clinical problems. Strong expertise in machine learning is required; expertise in statistics and experience with messy clinical data is a plus. Python fluency is required. Demonstrated ability to make meaningful contributions to projects with a research flavor is valuable.  
• Hands-on experience building predictive models
• Experience working with diverse data types including signal and structured data; experience with text data is a plus
• Experience in programming in Python; experience in additional languages (R, C/C++) is a plus
• Aware of current best practices in machine learning
• Fluency in one of the deep learning frameworks is a plus (PyTorch or Tensorflow)
• Knowledge of statistics, including hypothesis testing with parametric and non-parametric tests and basic probability
• PhD in computer science, electrical engineering, statistics or equivalent computational / quantitative fields (exceptional MS candidates will be considered)  
The goal of this project is to develop, evaluate and commercialize a tool for automatic detection of stroke during surgery. The successful candidate will have the rare opportunity to perform cutting-edge deep learning research and participate in a commercial endeavor.  
If interested, contact Shyam Visweswaran, MD, PhD at and Kayhan Batmanghelich, PhD at For details of ongoing research work, visit and The University of Pittsburgh is an Affirmative Action/Equal Opportunity Employer and values equality of opportunity, human dignity, and diversity. 

Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Continue Reading…


Read More

Understanding Cancer using Machine Learning

Use of Machine Learning (ML) in Medicine is becoming more and more important. One application example can be Cancer Detection and Analysis.

Continue Reading…


Read More

Amending Conquest’s Law to account for selection bias

Robert Conquest was a historian who published critical studies of the Soviet Union and whose famous “First Law” is, “Everybody is reactionary on subjects he knows about.” I did some searching on the internet, and the most authoritative source seems to be this quote from Conquest’s friend Kingsley Amis:

Further search led to this elaboration from philosopher Roger Scruton:

. . .

I agree with Scruton that we shouldn’t take the term “reactionary” (dictionary definition, “opposing political or social progress or reform”) too literally. Even Conquest, presumably, would not have objected to the law forbidding the employment of children as chimney sweeps.

The point of Conquest’s Law is that it’s easy to propose big changes in areas distant from you, but on the subjects you know about, you will respect tradition more, as you have more of an understanding of why it’s there. This makes sense, although I can also see the alternative argument that certain traditions might seem to make sense from a distance but are clearly absurd when looked at from close up. I guess it depends on the tradition.

In the realm of economics, for example, Engels, Keynes, and various others had a lot of direct experience of capitalism but it didn’t stop them from promoting revolution and reform. That said, Conquest’s Law makes sense and is clearly true in many cases, even if not always.

What motivated me to write this post, though, was not these sorts of rare exceptions—after all, most people who are successful in business are surely conservative, not radical, in their economic views—but rather an issue of selection bias.

Conquest was a successful academic and hung out with upper-class people, Oxbridge graduates, various people who were closer to the top than the bottom of the social ladder. From that perspective it’s perhaps no surprise that they were “reactionary” in their professional environments, as they were well ensconced there. This is not to deny the sincerity and relevance of such views, any more than we would want to deny the sincerity and relevance of radical views held by people with less exalted social positions. I’m sure the typical Ivy League professor such as myself is much more content and “reactionary” regarding the university system, then would be a debt-laden student or harried adjunct. I knew some people who worked for minimum wage at McDonalds, and I think their take on the institution was a bit less reactionary than that of the higher-ups. This doesn’t mean that people with radical views want to tear the whole thing down (after all, people teach classes, work at McDonalds, etc., out of their own free will), nor that reactionaries want no change. My only point here is that the results of a survey, even an informal survey, of attitudes will depend on who you think of asking.

It’s interesting how statistical principles can help us better understand even purely qualitative statements.

A similar issue arose with baseball analyst Bill James. As I wrote a few years ago:

In 2001, James wrote:

Are athletes special people? In general, no, but occasionally, yes. Johnny Pesky at 75 was trim, youthful, optimistic, and practically exploding with energy. You rarely meet anybody like that who isn’t an ex-athlete—and that makes athletes seem special.

I’ve met 75-year-olds like that, and none of them was an ex-athlete. That’s probably because I don’t know a lot of ex-athletes. But Bill James . . . he knows a lot of athletes. He went to the bathroom with Tim Raines once! The most I can say is that I saw Rickey Henderson steal a couple bases in a game against against the Orioles.

Cognitive psychologists talk about the base-rate fallacy, which is the mistake of estimating probabilities without accounting for underlying frequencies. Bill James knows a lot of ex-athletes, so it’s no surprise that the youthful, optimistic, 75-year-olds he meets are likely to be ex-athletes. The rest of us don’t know many ex-athletes, so it’s no surprise that most of the youthful, optimistic, 75-year-olds we meet are not ex-athletes. The mistake James made in the above quote was to write “You” when he really meant “I.” I’m not disputing his claim that athletes are disproportionately likely to become lively 75-year-olds; what I’m disagreeing with is his statement that almost all such people are ex-athletes. Yeah, I know, I’m being picky. But the point is important, I think, because of the window it offers into the larger issue of people being trapped in their own environments (the “availability heuristic,” in the jargon of cognitive psychology). Athletes loom large in Bill James’s world—I wouldn’t want it any other way—and sometimes he forgets that the rest of us live in a different world.

Another way to put it: Selection bias. Using a non-representative sample to drawing inappropriate inferences about the population.

This does not make Conquest’s or James’s observations valueless. We just have to interpret them carefully given the data, to get something like:

Conquest: People near the top of a hierarchy typically like it there.

James: I [James] know lots of energetic elderly athletes. Most of the elderly non-athletes I know are not energetic.

Continue Reading…


Read More

Four short links: 16 August 2019

Indigenous Data, Information Operations, and Optical Neural Nets

  1. Indigenous Data Sovereignty -- a podcast with Keoni Mahelona, who is part of Māori team building software to transcribe Māori speech.
  2. Toward an Information Operations Kill Chain (Bruce Schneier) -- it’s time to conceptualize the “information operations kill chain.” Information attacks against democracies, whether they’re attempts to polarize political processes or to increase mistrust in social institutions, also involve a series of steps. And enumerating those steps will clarify possibilities for defense.
  3. Advances in Optical Neural Networks -- After its design and training in a computer using modern deep learning methods, each network is physically fabricated, using for example 3-D printing or lithography, to engineer the trained network model into matter. This 3-D structure of engineered matter is composed of transmissive and/or reflective surfaces that altogether perform machine learning tasks through light-matter interaction and optical diffraction, at the speed of light, and without the need for any power, except for the light that illuminates the input object. This is especially significant for recognizing target objects much faster and with significantly less power compared to standard computer-based machine learning systems. For the performance of a neural network, not the training of it, but still a nifty idea.

Continue reading Four short links: 16 August 2019.

Continue Reading…


Read More

Pytorch Lightning vs PyTorch Ignite vs

Here, I will attempt an objective comparison between all three frameworks. This comparison comes from laying out similarities and differences objectively found in tutorials and documentation of all three frameworks.

Continue Reading…


Read More

3 tidyverse tricks for most commonly used Excel Features

[This article was first published on r-bloggers on Programming with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post, We’re simply going to see 5 tricks that could help improve your tooling using {tidyverse}.

Create a difference variable between the current value and the next value

This is also known as lead and lag – especially in a time series dataset this varaible becomes very important in feature engineering. In Excel, This is simply done by creating a new formula field and subtracting the next cell with the current cell or the current cell with the previous cell and dragging the cell formula to the last cell.

Let’s take our {fakir} fake_visits dataset and if we are trying to find a day when there was a huge peak/drop of home visits, we can identify only by first creating a column which must be the difference between the next value and the current value.

We can do with the {dplyr} function lag() and lead() for respective purposes.


df <- fakir::fake_visits()

df %>% # filter(year %in% '2017') %>%
mutate(day_lag = lag(home, default = home[1]) - home) %>% head()
## # A tibble: 6 x 9
##   timestamp   year month   day  home about  blog contact day_lag
## 1 2017-01-01  2017     1     1   352   176   521      NA       0
## 2 2017-01-02  2017     1     2   203   115   492      89     149
## 3 2017-01-03  2017     1     3   103    59   549      NA     100
## 4 2017-01-04  2017     1     4   484   113   633     331    -381
## 5 2017-01-05  2017     1     5   438   138   423     227      46
## 6 2017-01-06  2017     1     6    NA    75   478     289      NA

Combining Multiple Columns into one Column

One of the things that we often do in Excel is combining multiple columns into one column by concatenating the cell values. Like in the above example, we can see three columns year, month, date but all of them combined together can give us a date-format date (assuming the timestamp varible isn’t present) and that’s where the function unite() comes handy.

df %>% select(-timestamp) %>% head() %>% unite("date-format", c("year", "month", 
    "day"), sep = "-")
## # A tibble: 6 x 5
##   `date-format`  home about  blog contact
## 1 2017-1-1        352   176   521      NA
## 2 2017-1-2        203   115   492      89
## 3 2017-1-3        103    59   549      NA
## 4 2017-1-4        484   113   633     331
## 5 2017-1-5        438   138   423     227
## 6 2017-1-6         NA    75   478     289

Splitting One Column into Multiple Columns

This is the inverse of what we did above and another very frequently used excel feature Text to Columns.

In the fakir_visits(), let’s assume we don’t have year,month and day separately and now we’ve got to create those three columns from timestamp. This is quite simple with separate() function.

df %>% select(-c("year", "month", "day")) %>% head() %>% separate(col = timestamp, 
    into = c("year", "month", "day"), sep = "-")
## # A tibble: 6 x 7
##   year  month day    home about  blog contact
## 1 2017  01    01      352   176   521      NA
## 2 2017  01    02      203   115   492      89
## 3 2017  01    03      103    59   549      NA
## 4 2017  01    04      484   113   633     331
## 5 2017  01    05      438   138   423     227
## 6 2017  01    06       NA    75   478     289


The idea of this post was to introduce those four functions:

  • lead()
  • lag()
  • unite()
  • separate()

and show case how super-useful they are for many commonly used Excel features in Data Analysis.

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on Programming with R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Measuring pop music’s falsetto usage

Vox and Matt Daniels delved into falsetto in pop music over the years. Is falsetto a big trend now compared to the rest of the history? The process of finding the answer, noisy data and all, was just as interesting as the answer itself.

Tags: , , ,

Continue Reading…


Read More

We’re RStudio Trainers!

[This article was first published on r – Jumping Rivers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We’re RStudio Trainers!

Big news. RStudio recently started certifying trainers in three areas: the tidyverse, Shiny and teaching. To be certified to teach a topic you have to pass the exam for that topic and the teaching exam.

Even bigger news. Four of your lovely Jumping Rivers trainers are now certified to teach at least one topic! Check out the RStudio certified trainers page to see me (Theo Roe), Rhian Davies, Colin Gillespie and Roman Popat in action!

P.S. whilst we’ve got you, if you want to learn the tidyverse or shiny see here

Rstudio Partners

The post We’re RStudio Trainers! appeared first on Jumping Rivers.

To leave a comment for the author, please follow the link and comment on their blog: r – Jumping Rivers. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

EARL London – speaker interview

[This article was first published on RBlog – Mango Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Robert Duff (Transport for London) and Rahulan Chandrasekaran (Department for Transport)

Robert and Rahulan are doing a joint presentation titled ‘Let me in! Let me on! Quantifying highly frustrating events on the Underground’ on 11 September at EARL London. We dropped Robert an email to find out more around the subject of his and Rahulan’s talk.

How do you think technology is shaping modern transport?

Incredibly. It definitely feels like we are in something of a reboot stage at the moment. The challenge is staying relevant and positioning yourself to be flexible enough to adapt. The next advancement could be just around the corner. From the noticeable increase in ride-hailing services, electric vehicles and autonomous vehicles trials, it’s clear that technology is already shaping transport offerings as well as defining how users interact with them. The quality and quantity of information available to both public transport users and road users in recent years has really advanced. And of course, whilst we go on this journey it’s paramount to have safety at the forefront of our mind and to always be on the lookout for opportunities to encourage trips on more sustainable modes.

What challenges do organisations face in helping to shape modern transport?

Although it’s been mentioned a few times recently in various blogs that I’ve read, I’m just going to re-iterate here one of the key challenges. Organisations can do their best to keep up to date with technological trends and have vast amounts of data and the right mix of Data Scientist/Engineers, but the ability to shape really starts with making in-roads towards an organisational culture where data and openness is at the core.

To unlock the benefits of technological advancements you need to have the ability to influence and have decision-makers who are confident when talking about data.

It would be great for example, if everyone knows what machine learning is but we know this won’t happen overnight and part of the challenge is explaining such topics so that everyone has a chance of grasping what they mean. It also helps when everyone is upfront and honest, happily stating when they don’t quite understand – there’s absolutely no shame in asking someone to repeat themselves but in a slightly different way 😊. Organisations with strong analytical communities that fall naturally into the habit of sharing knowledge, learning from each other and unearthing best practices, are the ones in prime position to face this difficult challenge.

Why did you pick R for this project?

Wrangling and Visualising make up a big part of this work so R was a very good fit in that respect. Particularly important for Rahulan (my co-presenter) and I, was the ability to put the data into our stakeholders hands to interact with – and we found some fantastic packages for that.

What are you planning after this project?

The direction of travel for this project is pretty exciting. Since the project has got going, and from what we’re going to present at EARL, we’re now in a position where we have more data than before and in considerable quantities. We can now complement our ticketing and train movement data with WiFi data from within our stations. This gives us an extra dimension as we can begin to think about applying more advanced techniques to our problem, possibly taking a trip into predictive analytics territory with the aim of improving the customer experience.

Thanks to Robert for this interview – please take a look at the other speakers that we have presenting. It’s going to be 3 days of jam-packed R goodness!

There are only 4 weeks left until EARL, you can get your tickets here.

To leave a comment for the author, please follow the link and comment on their blog: RBlog – Mango Solutions. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Whats new on arXiv – Complete List

NeuroMask: Explaining Predictions of Deep Neural Networks through Mask Learning
OD-GCN object detection by knowledge graph with GCN
Explaining Convolutional Neural Networks using Softmax Gradient Layer-wise Relevance Propagation
Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics
Channel Decomposition on Generative Networks
DeepAISE — An End-to-End Development and Deployment of a Recurrent Neural Survival Model for Early Prediction of Sepsis
Neural Text Generation with Unlikelihood Training
Feature Partitioning for Efficient Multi-Task Architectures
Adversarial Neural Pruning
Wasserstein Index Generation Model: Automatic Generation of Time-series Index with Application to Economic Policy Uncertainty
Linking Graph Entities with Multiplicity and Provenance
Comparison theorems on large-margin learning
Regional Tree Regularization for Interpretability in Black Box Models
Boosted GAN with Semantically Interpretable Information for Image Inpainting
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
Getting To Know You: User Attribute Extraction from Dialogues
Attention is not not Explanation
Matrix Nets: A New Deep Architecture for Object Detection
Requirements Engineering for Machine Learning: Perspectives from Data Scientists
Detecting semantic anomalies
Towards Self-Explainable Cyber-Physical Systems
Semi-Supervised Learning using Differentiable Reasoning
Exploiting Parallelism Opportunities with Deep Learning Frameworks
metric-learn: Metric Learning Algorithms in Python
IMS-Speech: A Speech to Text Tool
Adaptive Learning of Aggregate Analytics under Dynamic Workloads
Action Recognition in Untrimmed Videos with Composite Self-Attention Two-Stream Framework
Detection of the Group of Traffic Signs with Central Slice Theorem
Mass Estimation from Images using Deep Neural Network and Sparse Ground Truth
Difficulty Classification of Mountainbike Downhill Trails utilizing Deep Neural Networks
Sommerfeld type integrals for discrete diffraction problems
The Noise Collector for sparse recovery in high dimensions
Local Supports Global: Deep Camera Relocalization with Sequence Enhancement
SkrGAN: Sketching-rendering Unconditional Generative Adversarial Networks for Medical Image Synthesis
Deep Learning for Detecting Building Defects Using Convolutional Neural Networks
An Unsupervised, Iterative N-Dimensional Point-Set Registration Algorithm
A fast multi-object tracking system using an object detector ensemble
Fine-Tuning Models Comparisons on Garbage Classification for Recyclability
Quantifying information loss on chaotic attractors through recurrence networks
Energy and Performance Analysis of STTRAM Caches for Mobile Applications
Graph Embedding Using Infomax for ASD Classification and Brain Functional Difference Detection
Stability Analysis of Reservoir Computers Dynamics via Lyapunov Functions
Enforcing Perceptual Consistency on Generative Adversarial Networks by Using the Normalised Laplacian Pyramid Distance
Repetitive Reprediction Deep Decipher for Semi-Supervised Learning
The Channel Attention based Context Encoder Network for Inner Limiting Membrane Detection
Cyclic Oritatami Systems Cannot Fold Infinite Fractal Curves
Identification of relevant diffusion MRI metrics impacting cognitive functions using a novel feature selection method
Multi-modality Latent Interaction Network for Visual Question Answering
Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations
A symbolic approach to the self-triggered design for networked control systems
Deep Dexterous Grasping of Novel Objects from a Single View
CMB-GAN: Fast Simulations of Cosmic Microwave background anisotropy maps using Deep Learning
Online Continual Learning with Maximally Interfered Retrieval
Learn to Compress CSI and Allocate Resources in Vehicular Networks
Interactive coin offerings
Assignability of dichotomy spectrum for discrete time-varying linear control systems
Matrix-analytic solution of system of integral equations in three tandem servers
Space-Efficient Construction of Compressed Suffix Trees
Multi-View Fuzzy Clustering with The Alternative Learning between Shared Hidden Space and Partition
Multi-view Clustering with the Cooperation of Visible and Hidden Views
Deep Learning-Based Quantification of Pulmonary Hemosiderophages in Cytology Slides
Tensor-based EDMD for the Koopman analysis of high-dimensional systems
Super-resolution of Omnidirectional Images Using Adversarial Learning
Efficient Resource Allocation for Mobile-Edge Computing Networks with NOMA: Completion Time and Energy Minimization
Theory of the Phase Transition in Random Unitary Circuits with Measurements
Thermal conductance of one dimensional disordered harmonic chains
Tropically planar graphs
Multi-timescale Trajectory Prediction for Abnormal Human Activity Detection
Classes of Full-Duplex Channels with Capacity Achieved Without Adaptation
Identifying shifts between two regression curves
Nonleaf Patterns in Trees: Protected Nodes and Fine Numbers
Elements of asymptotic theory with outer probability measures
LSTM vs. GRU vs. Bidirectional RNN for script generation
Random walk model from the point of view of algorithmic trading
Fairness and efficiency for probabilistic allocations with endowments
Why Does a Visual Question Have Different Answers?
Learning to Detect Collisions for Continuum Manipulators without a Prior Model
Error Bounds and Singularity Degree in Semidefinite Programming
Generalised trophic levels and graph hierarchy
AmazonQA: A Review-Based Question Answering Task
Methodological Issues in Observational Studies
Collective marks and first passage times
Uncertainty Model Estimation in an Augmented Data Space for Robust State Estimation
MULAN: Multitask Universal Lesion Analysis Network for Joint Lesion Detection, Tagging, and Segmentation
Prototyping Software Transceiver for the 5G New Radio Physical Uplink Shared Channel
Efficient Contraction of Large Tensor Networks for Weighted Model Counting through Graph Decompositions
Chip-Firing Games and Critical Groups
A zero interest rate Black-Derman-Toy model
On breadth-first constructions of scaling limits of random graphs and random unicellular maps
Active Damping of Power Oscillations Following Frequency Changes in Low Inertia Power Systems
Comparison of coupled nonlinear oscillator models for the transient response of power generating stations connected to low inertia systems
Low-cost low-power in-vehicle occupant detection with mm-wave FMCW radar
Point-Based Multi-View Stereo Network
A sub-modular receding horizon solution for mobile multi-agent persistent monitoring
A Groupwise Approach for Inferring Heterogeneous Treatment Effects in Causal Inference
An energy consistent discretization of the nonhydrostatic equations in primitive variables
Dynamic Contract Design for Systemic Cyber Risk Management of Interdependent Enterprise Networks
Sharp Guarantees for Solving Random Equations with One-Bit Information
Positron Annihilation Lifetime Spectroscopy Using Fast Scintillators and Digital Electronics
Spectral and Dynamic Consequences of Network Specialization
Superstition in the Network: Deep Reinforcement Learning Plays Deceptive Games
Quantitative combinatorial geometry for concave functions
Learning Target-oriented Dual Attention for Robust RGB-T Tracking
Context-Aware Information Lapse for Timely Status Updates in Remote Control Systems
Multi-objective scheduling on two dedicated processors
On the Convergence of AdaBound and its Connection to SGD
The bias of isotonic regression
DL-PDE: Deep-learning based data-driven discovery of partial differential equations from discrete and noisy data
Industrial Control via Application Containers: Migrating from Bare-Metal to IAAS
Few Labeled Atlases are Necessary for Deep-Learning-Based Segmentation
A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates
Collaborative Multi-agent Learning for MR Knee Articular Cartilage Segmentation
Einconv: Exploring Unexplored Tensor Decompositions for Convolutional Neural Networks
Exploiting Multi-domain Visual Information for Fake News Detection
On Defending Against Label Flipping Attacks on Malware Detection Systems
Proceedings Third Joint Workshop on Developments in Implicit Computational complExity and Foundational & Practical Aspects of Resource Analysis
Quantum adiabatic machine learning with zooming
Reinforcement Learning based Interconnection Routing for Adaptive Traffic Optimization
Understanding Spatial Language in Radiology: Representation Framework, Annotation, and Spatial Relation Extraction from Chest X-ray Reports using Deep Learning
Private Rank Aggregation under Local Differential Privacy
A Generic Solver for Unconstrained Control Problems with Integral Functional Objectives
An Auxiliary Space Preconditioner for Fractional Laplacian of Negative Order
Sheaf homology of hyperplane arrangements, Boolean covers and exterior powers
Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation
On the Complexity of Checking Transactional Consistency
Growth of Common Friends in a Preferential Attachment Model
A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue
Interpolated Convolutional Networks for 3D Point Cloud Understanding
Beyond the Inverted Index
ConfigTron: Tackling network diversity with heterogeneous configurations
Three Branches: Detecting Actions With Richer Features
SDM-NET: Deep Generative Network for Structured Deformable Mesh
A proximal DC approach for quadratic assignment problem
4-Connected Triangulations on Few Lines
THINC-scaling scheme that unifies VOF and level set methods
Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning
Offensive Language and Hate Speech Detection for Danish
Icebreaker: Element-wise Active Information Acquisition with Bayesian Deep Latent Gaussian Model
Assessing the Impact of Blood Pressure on Cardiac Function Using Interpretable Biomarkers and Variational Autoencoders
Random Pilot and Data Access for Massive MIMO Spatially Correlated Rayleigh Fading Channels
Integration by parts formula for killed processes: A point of view from approximation theory
Existence of non-Cayley Haar graphs
Principal symmetric space analysis
Reinterpretation and Extension of Entropy Correction Terms for Residual Distribution and Discontinuous Galerkin Schemes
On Steane-Enlargement of Quantum Codes from Cartesian Product Point Sets
Null Space Analysis for Class-Specific Discriminant Learning
EASSE: Easier Automatic Sentence Simplification Evaluation
Incorporating Task-Specific Structural Knowledge into CNNs for Brain Midline Shift Detection
Forecast Encompassing Tests for the Expected Shortfall
Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking
Bregman Itoh–Abe methods for sparse optimisation
Network constraints on the mixing patterns of binary node metadata
A Building-Block Approach to State-Space Modeling of DC-DC Converter Systems
A Simulative Study on Active Disturbance Rejection Control (ADRC) as a Control Tool for Practitioners
Inverse Parametric Uncertain Identification using Polynomial Chaos and high-order Moment Matching benchmarked on a Wet Friction Clutch
Is This The Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization
Valley notch filter in a graphene strain superlattice: a Green’s function and machine learning approach
V2X-Based Vehicular Positioning: Opportunities, Challenges, and Future Directions
Practical Active Disturbance Rejection Control: Bumpless Transfer, Rate Limitation and Incremental Algorithm
The automorphism group of the zero-divisor digraph of matrices over an antiring
Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data
Modeling Personality vs. Modeling Personalidad: In-the-wild Mobile Data Analysis in Five Countries Suggests Cultural Impact on Personality Models
Bisector energy and pinned distances in positive characteristic
L2P: An Algorithm for Estimating Heavy-tailed Outcomes
Evaluation of a Recommender System for Assisting Novice Game Designers
WFRFT-aided Power-efficient Multi-beam Directional Modulation Schemes Based on Frequency Diverse Array
Construction of efficient detectors for character information recognition
Extract Method Refactoring by Successive Edge Contraction
Numerical benchmarking of fluid-rigid body interactions
Semigroup Models for Biochemical Reaction Networks
Stability and Convergence of Spectral Mixed Discontinuous Galerkin Methods for 3D Linear Elasticity on Anisotropic Geometric Meshes
A Two-Ray Multipath Model for Frequency Diverse Array-Based Directional Modulation in MISOME Wiretap Channels
On Range Sidelobe Reduction for Dual-functional Radar-Communication Waveforms
A multi-level ADMM algorithm for elliptic PDE-constrained optimization problems
Modularity belief propagation on multilayer networks to detect significant community structure
Maximum Rectilinear Crossing Number of Uniform Hypergraphs
Bayesian automated posterior repartitioning for nested sampling
On the fixed volume discrepancy of the Fibonacci sets in the integral norms
Playing log(N)-Questions over Sentences
Time-changed Dirac-Fokker-Planck equations on the lattice
Neural Machine Translation with Noisy Lexical Constraints
A Proof of First Digit Law from Laplace Transform
Estimating & Mitigating the Impact of Acoustic Environments on Machine-to-Machine Signalling
Finding and counting permutations via CSPs
Rare-Event Properties of the Nagel-Schreckenberg Model
Is Deep Reinforcement Learning Really Superhuman on Atari?
Blinded sample size re-estimation in equivalence testing
Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics
Generalizing Deep Whole Brain Segmentation for Pediatric and Post-Contrast MRI with Augmented Transfer Learning
Two-row $W$-graphs in affine type $A$
Superpermutation matrices
Gradient-based shape optimization for the reduction of particle erosion in bended pipes
Some harmonic functions for killed Markov branching processes with immigration and culling
Classification and prediction of wave chaotic systems with machine learning techniques
Micro-architectural Analysis of OLAP: Limitations and Opportunities
Learning elementary structures for 3D shape generation and matching
On $k$-antichains in the unit $n$-cube
Improving Generalization in Coreference Resolution via Adversarial Training
Complicated Table Structure Recognition
Schedules and the Delta Conjecture
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
Optimal Estimation of Generalized Average Treatment Effects using Kernel Optimal Matching
Resolution analysis of inverting the generalized Radon transform from discrete data in $\mathbb R^3$
Fine-grained Information Status Classification Using Discourse Context-Aware Self-Attention
Parabolic subgroups and Automorphism groups of Schubert varieties
$p$-adic Integral Geometry
Learn How to Cook a New Recipe in a New House: Using Map Familiarization, Curriculum Learning, and Common Sense to Learn Families of Text-Based Adventure Games
Improved circuits for a biologically-inspired random pulse computer
Distributed Estimation in the Presence of Strategic Data Sources
Predicting 3D Human Dynamics from Video

Continue Reading…


Read More

Whats new on arXiv

NeuroMask: Explaining Predictions of Deep Neural Networks through Mask Learning

Deep Neural Networks (DNNs) deliver state-of-the-art performance in many image recognition and understanding applications. However, despite their outstanding performance, these models are black-boxes and it is hard to understand how they make their decisions. Over the past few years, researchers have studied the problem of providing explanations of why DNNs predicted their results. However, existing techniques are either obtrusive, requiring changes in model training, or suffer from low output quality. In this paper, we present a novel method, NeuroMask, for generating an interpretable explanation of classification model results. When applied to image classification models, NeuroMask identifies the image parts that are most important to classifier results by applying a mask that hides/reveals different parts of the image, before feeding it back into the model. The mask values are tuned by minimizing a properly designed cost function that preserves the classification result and encourages producing an interpretable mask. Experiments using state-of-the-art Convolutional Neural Networks for image recognition on different datasets (CIFAR-10 and ImageNet) show that NeuroMask successfully localizes the parts of the input image which are most relevant to the DNN decision. By showing a visual quality comparison between NeuroMask explanations and those of other methods, we find NeuroMask to be both accurate and interpretable.

OD-GCN object detection by knowledge graph with GCN

Classical object detection frameworks lack of utilizing objects’ surrounding information. In this article, we introduce the graph convolutional networks (GCN) into the object detection, and propose a new framework called OD-GCN (object detection with graph convolutional network). It utilizes the category relationship to improve the detection precision. We set up a knowledge graph to reflect the co-exist relationships among objects. GCN plays the role of post-processing to adjust the output of base object detection models. It is a flexible framework that any pre-trained object detection models can be used as the base model. In the experiments, we try several popular base detection models, OD-GCN always improve mAP by 1-5 pp in COCO dataset. In addition, visualized analysis reveals the benchmark improvement is quite logical in human’s opinion.

Explaining Convolutional Neural Networks using Softmax Gradient Layer-wise Relevance Propagation

Convolutional Neural Networks (CNN) have become state-of-the-art in the field of image classification. However, not everything is understood about their inner representations. This paper tackles the interpretability and explainability of the predictions of CNNs for multi-class classification problems. Specifically, we propose a novel visualization method of pixel-wise input attribution called Softmax-Gradient Layer-wise Relevance Propagation (SGLRP). The proposed model is a class discriminate extension to Deep Taylor Decomposition (DTD) using the gradient of softmax to back propagate the relevance of the output probability to the input image. Through qualitative and quantitative analysis, we demonstrate that SGLRP can successfully localize and attribute the regions on input images which contribute to a target object’s classification. We show that the proposed method excels at discriminating the target objects class from the other possible objects in the images. We confirm that SGLRP performs better than existing Layer-wise Relevance Propagation (LRP) based methods and can help in the understanding of the decision process of CNNs.

Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics

The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; leaping key geospatial applications into the dual data and compute intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges. We implement a novel remote sensing data flow (RESFlow) for advanced machine learning and computing with massive amounts of remotely sensed imagery. The core contribution is partitioning massive amount of data based on the spectral and semantic characteristics for distributed imagery analysis. RESFlow takes advantage of both a unified analytics engine for large-scale data processing and the availability of modern computing hardware to harness the acceleration of deep learning inference on expansive remote sensing imagery. The framework incorporates a strategy to optimize resource utilization across multiple executors assigned to a single worker. We showcase its deployment across computationally and data-intensive on pixel-level labeling workloads. The pipeline invokes deep learning inference at three stages; during deep feature extraction, deep metric mapping, and deep semantic segmentation. The tasks impose compute intensive and GPU resource sharing challenges motivating for a parallelized pipeline for all execution steps. By taking advantage of Apache Spark, Nvidia DGX1, and DGX2 computing platforms, we demonstrate unprecedented compute speed-ups for deep learning inference on pixel labeling workloads; processing 21,028~Terrabytes of imagery data and delivering an output maps at area rate of, amounting to 453,168 – reducing a 28 day workload to 21~hours.

Channel Decomposition on Generative Networks

This work presents a method to decompose a layer of the generative networks into the painting actions. To behave like the human painter, these actions are driven by the cost simulating the hand movement, the paint color change, the stroke shape and the stroking style. To help planning, the Mask R-CNN is applied to detect the object areas and decide the painting order. The proposed painting system introduces a variety of extensions in artistic styles, based on the chosen parameters. Further experiments are performed to evaluate the channel penetration and the channel sensitivity on the strokes.

DeepAISE — An End-to-End Development and Deployment of a Recurrent Neural Survival Model for Early Prediction of Sepsis

Sepsis, a dysregulated immune system response to infection, is among the leading causes of morbidity, mortality, and cost overruns in the Intensive Care Unit (ICU). Early prediction of sepsis can improve situational awareness amongst clinicians and facilitate timely, protective interventions. While the application of predictive analytics in ICU patients has shown early promising results, much of the work has been encumbered by high false-alarm rates. Efforts to improve specificity have been limited by several factors, most notably the difficulty of labeling sepsis onset time and the low prevalence of septic-events in the ICU. Here, we present DeepAISE (Deep Artificial Intelligence Sepsis Expert), a recurrent neural survival model for the early prediction of sepsis. We show that by coupling a clinical criterion for defining sepsis onset time with a treatment policy (e.g., initiation of antibiotics within one hour of meeting the criterion), one may rank the relative utility of various criteria through offline policy evaluation. Given the optimal criterion, DeepAISE automatically learns predictive features related to higher-order interactions and temporal patterns among clinical risk factors that maximize the data likelihood of observed time to septic events. DeepAISE has been incorporated into a clinical workflow, which provides real-time hourly sepsis risk scores. A comparative study of four baseline models indicates that DeepAISE produces the most accurate predictions (AUC=0.90 and 0.87) and the lowest false alarm rates (FAR=0.20 and 0.26) in two separate cohorts (internal and external, respectively), while simultaneously producing interpretable representations of the clinical time series and risk factors.

Neural Text Generation with Unlikelihood Training

Neural text generation is a key tool in natural language applications, but it is well known there are major problems at its core. In particular, standard likelihood training and decoding leads to dull and repetitive responses. While some post-hoc fixes have been proposed, in particular top-k and nucleus sampling, they do not address the fact that the token-level probabilities predicted by the model itself are poor. In this paper we show that the likelihood objective itself is at fault, resulting in a model that assigns too much probability to sequences that contain repeats and frequent words unlike the human training distribution. We propose a new objective, unlikelihood training, which forces unlikely generations to be assigned lower probability by the model. We show that both token and sequence level unlikelihood training give less repetitive, less dull text while maintaining perplexity, giving far superior generations using standard greedy or beam search. Our approach provides a strong alternative to traditional training.

Feature Partitioning for Efficient Multi-Task Architectures

Multi-task learning holds the promise of less data, parameters, and time than training of separate models. We propose a method to automatically search over multi-task architectures while taking resource constraints into consideration. We propose a search space that compactly represents different parameter sharing strategies. This provides more effective coverage and sampling of the space of multi-task architectures. We also present a method for quick evaluation of different architectures by using feature distillation. Together these contributions allow us to quickly optimize for efficient multi-task models. We benchmark on Visual Decathlon, demonstrating that we can automatically search for and identify multi-task architectures that effectively make trade-offs between task resource requirements while achieving a high level of final performance.

Adversarial Neural Pruning

It is well known that neural networks are susceptible to adversarial perturbations and are also computationally and memory intensive which makes it difficult to deploy them in real-world applications where security and computation are constrained. In this work, we aim to obtain both robust and sparse networks that are applicable to such scenarios, based on the intuition that latent features have a varying degree of susceptibility to adversarial perturbations. Specifically, we define vulnerability at the latent feature space and then propose a Bayesian framework to prioritize features based on their contribution to both the original and adversarial loss, to prune vulnerable features and preserve the robust ones. Through quantitative evaluation and qualitative analysis of the perturbation to latent features, we show that our sparsification method is a defense mechanism against adversarial attacks and the robustness indeed comes from our model’s ability to prune vulnerable latent features that are more susceptible to adversarial perturbations.

Wasserstein Index Generation Model: Automatic Generation of Time-series Index with Application to Economic Policy Uncertainty

I propose a novel method, called the Wasserstein Index Generation model (WIG), to generate public sentiment index automatically. It can be performed off-the-shelf and is especially good at detecting sudden sentiment spikes. To test the model’s effectiveness, an application to generate Economic Policy Uncertainty (EPU) index is showcased.

Linking Graph Entities with Multiplicity and Provenance

Entity linking is a fundamental database problem with applicationsin data integration, data cleansing, information retrieval, knowledge fusion, and knowledge-base population. It is the task of accurately identifying multiple, differing, and possibly contradictingrepresentations of the same real-world entity in data. In this work,we propose an entity linking system capable of linking entitiesacross different databases and mentioned-entities extracted fromtext data. Our entity linking solution, called Certus, uses a graph model to represent the profiles of entities. The graph model is versatile, thus, it is capable of handling multiple values for an attributeor a relationship, as well as the provenance descriptions of thevalues. Provenance descriptions of a value provide the settings ofthe value, such as validity periods, sources, security requirements,etc. This paper presents the architecture for the entity linking system, the logical, physical, and indexing models used in the system,and the general linking process. Furthermore, we demonstrate theperformance of update operations of the physical storage modelswhen the system is implemented in two state-of-the-art databasemanagement systems, HBase and Postgres.

Comparison theorems on large-margin learning

This paper studies binary classification problem associated with a family of loss functions called large-margin unified machines (LUM), which offers a natural bridge between distribution-based likelihood approaches and margin-based approaches. It also can overcome the so-called data piling issue of support vector machine in the high-dimension and low-sample size setting. In this paper we establish some new comparison theorems for all LUM loss functions which play a key role in the further error analysis of large-margin learning algorithms.

Regional Tree Regularization for Interpretability in Black Box Models

The lack of interpretability remains a barrier to the adoption of deep neural networks. Recently, tree regularization has been proposed to encourage deep neural networks to resemble compact, axis-aligned decision trees without significant compromises in accuracy. However, it may be unreasonable to expect that a single tree can predict well across all possible inputs. In this work, we propose regional tree regularization, which encourages a deep model to be well-approximated by several separate decision trees specific to predefined regions of the input space. Practitioners can define regions based on domain knowledge of contexts where different decision-making logic is needed. Across many datasets, our approach delivers more accurate predictions than simply training separate decision trees for each region, while producing simpler explanations than other neural net regularization schemes without sacrificing predictive power. Two healthcare case studies in critical care and HIV demonstrate how experts can improve understanding of deep models via our approach.

Boosted GAN with Semantically Interpretable Information for Image Inpainting

Image inpainting aims at restoring missing region of corrupted images, which has many applications such as image restoration and object removal. However, current GAN-based inpainting models fail to explicitly consider the semantic consistency between restored images and original images. Forexample, given a male image with image region of one eye missing, current models may restore it with a female eye. This is due to the ambiguity of GAN-based inpainting models: these models can generate many possible restorations given a missing region. To address this limitation, our key insight is that semantically interpretable information (such as attribute and segmentation information) of input images (with missing regions) can provide essential guidance for the inpainting process. Based on this insight, we propose a boosted GAN with semantically interpretable information for image inpainting that consists of an inpainting network and a discriminative network. The inpainting network utilizes two auxiliary pretrained networks to discover the attribute and segmentation information of input images and incorporates them into the inpainting process to provide explicit semantic-level guidance. The discriminative network adopts a multi-level design that can enforce regularizations not only on overall realness but also on attribute and segmentation consistency with the original images. Experimental results show that our proposed model can preserve consistency on both attribute and segmentation level, and significantly outperforms the state-of-the-art models.

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Recently, the pre-trained language model, BERT (Devlin et al.(2018)Devlin, Chang, Lee, and Toutanova), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman (Elman(1990)), we extend BERT to a new model, StructBERT, by incorporating language structures into pretraining. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. As a result, the new model is adapted to different levels of language understanding required by downstream tasks. The StructBERT with structural pre-training gives surprisingly good empirical results on a variety of downstream tasks, including pushing the state-of-the-art on the GLUE benchmark to 84.5 (with Top 1 achievement on the Leaderboard at the time of paper submission), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy on SNLI to 91.7.

Getting To Know You: User Attribute Extraction from Dialogues

User attributes provide rich and useful information for user understanding, yet structured and easy-to-use attributes are often sparsely populated. In this paper, we leverage dialogues with conversational agents, which contain strong suggestions of user information, to automatically extract user attributes. Since no existing dataset is available for this purpose, we apply distant supervision to train our proposed two-stage attribute extractor, which surpasses several retrieval and generation baselines on human evaluation. Meanwhile, we discuss potential applications (e.g., personalized recommendation and dialogue systems) of such extracted user attributes, and point out current limitations to cast light on future work.

Attention is not not Explanation

Attention mechanisms play a central role in NLP systems, especially within recurrent neural network (RNN) models. Recently, there has been increasing interest in whether or not the intermediate representations offered by these modules may be used to explain the reasoning for a model’s prediction, and consequently reach insights regarding the model’s decision-making process. A recent paper claims that `Attention is not Explanation’ (Jain and Wallace, 2019). We challenge many of the assumptions underlying this work, arguing that such a claim depends on one’s definition of explanation, and that testing it needs to take into account all elements of the model, using a rigorous experimental design. We propose four alternative tests to determine when/whether attention can be used as explanation: a simple uniform-weights baseline; a variance calibration based on multiple random seed runs; a diagnostic framework using frozen weights from pretrained models; and an end-to-end adversarial attention training protocol. Each allows for meaningful interpretation of attention mechanisms in RNN models. We show that even when reliable adversarial distributions can be found, they don’t perform well on the simple diagnostic, indicating that prior work does not disprove the usefulness of attention mechanisms for explainability.

Matrix Nets: A New Deep Architecture for Object Detection

We present Matrix Nets (xNets), a new deep architecture for object detection. xNets map objects with different sizes and aspect ratios into layers where the sizes and the aspect ratios of the objects within their layers are nearly uniform. Hence, xNets provide a scale and aspect ratio aware architecture. We leverage xNets to enhance key-points based object detection. Our architecture achieves mAP of 47.8 on MS COCO, which is higher than any other single-shot detector while using half the number of parameters and training 3x faster than the next best architecture.

Requirements Engineering for Machine Learning: Perspectives from Data Scientists

Machine learning (ML) is used increasingly in real-world applications. In this paper, we describe our ongoing endeavor to define characteristics and challenges unique to Requirements Engineering (RE) for ML-based systems. As a first step, we interviewed four data scientists to understand how ML experts approach elicitation, specification, and assurance of requirements and expectations. The results show that changes in the development paradigm, i.e., from coding to training, also demands changes in RE. We conclude that development of ML systems demands requirements engineers to: (1) understand ML performance measures to state good functional requirements, (2) be aware of new quality requirements such as explainability, freedom from discrimination, or specific legal requirements, and (3) integrate ML specifics in the RE process. Our study provides a first contribution towards an RE methodology for ML systems.

Detecting semantic anomalies

We critically appraise the recent interest in out-of-distribution (OOD) detection, questioning the practical relevance of existing benchmarks. While the currently prevalent trend is to consider different datasets as OOD, we posit that out-distributions of practical interest are ones where the distinction is semantic in nature, and evaluative tasks should reflect this more closely. Assuming a context of computer vision object recognition problems, we then recommend a set of benchmarks which we motivate by referencing practical applications of anomaly detection. Finally, we explore a multi-task learning based approach which suggests that auxiliary objectives for improved semantic awareness can result in improved semantic anomaly detection, with accompanying generalization benefits.

Towards Self-Explainable Cyber-Physical Systems

With the increasing complexity of CPSs, their behavior and decisions become increasingly difficult to understand and comprehend for users and other stakeholders. Our vision is to build self-explainable systems that can, at run-time, answer questions about the system’s past, current, and future behavior. As hitherto no design methodology or reference framework exists for building such systems, we propose the MAB-EX framework for building self-explainable systems that leverage requirements- and explainability models at run-time. The basic idea of MAB-EX is to first Monitor and Analyze a certain behavior of a system, then Build an explanation from explanation models and convey this EXplanation in a suitable way to a stakeholder. We also take into account that new explanations can be learned, by updating the explanation models, should new and yet un-explainable behavior be detected by the system.

Semi-Supervised Learning using Differentiable Reasoning

We introduce Differentiable Reasoning (DR), a novel semi-supervised learning technique which uses relational background knowledge to benefit from unlabeled data. We apply it to the Semantic Image Interpretation (SII) task and show that background knowledge provides significant improvement. We find that there is a strong but interesting imbalance between the contributions of updates from Modus Ponens (MP) and its logical equivalent Modus Tollens (MT) to the learning process, suggesting that our approach is very sensitive to a phenomenon called the Raven Paradox. We propose a solution to overcome this situation.

Exploiting Parallelism Opportunities with Deep Learning Frameworks

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance characterization and domain-specific knowledge. This paper takes a deep dive into analyzing the performance impact of key design features and the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. The evaluation results show that our proposed performance tuning guidelines outperform both the Intel and TensorFlow recommended settings by 1.29x and 1.34x, respectively, across a diverse set of real-world deep learning models.

metric-learn: Metric Learning Algorithms in Python

metric-learn is an open source Python package implementing supervised and weakly-supervised distance metric learning algorithms. As part of scikit-learn-contrib, it provides a unified interface compatible with scikit-learn which allows to easily perform cross-validation, model selection, and pipelining with other machine learning estimators. metric-learn is thoroughly tested and available on PyPi under the MIT licence.

IMS-Speech: A Speech to Text Tool

We present the IMS-Speech, a web based tool for German and English speech transcription aiming to facilitate research in various disciplines which require accesses to lexical information in spoken language materials. This tool is based on modern open source software stack, advanced speech recognition methods and public data resources and is freely available for academic researchers. The utilized models are built to be generic in order to provide transcriptions of competitive accuracy on a diverse set of tasks and conditions.

Adaptive Learning of Aggregate Analytics under Dynamic Workloads

Large organizations have seamlessly incorporated data-driven decision making in their operations. However, as data volumes increase, expensive big data infrastructures are called to rescue. In this setting, analytics tasks become very costly in terms of query response time, resource consumption, and money in cloud deployments, especially when base data are stored across geographically distributed data centers. Therefore, we introduce an adaptive Machine Learning mechanism which is light-weight, stored client-side, can estimate the answers of a variety of aggregate queries and can avoid the big data backend. The estimations are performed in milliseconds are inexpensive and accurate as the mechanism learns from past analytical-query patterns. However, as analytic queries are ad-hoc and analysts’ interests change over time we develop solutions that can swiftly and accurately detect such changes and adapt to new query patterns. The capabilities of our approach are demonstrated using extensive evaluation with real and synthetic datasets.

Continue Reading…


Read More

If you did not already know

Expected Value of Perfect Information (EVPI) google
In decision theory, the expected value of perfect information (EVPI) is the price that one would be willing to pay in order to gain access to perfect information.

Spectral Graph Analysis google
Complex networks or graphs are ubiquitous in sciences and engineering: biological networks, brain networks, transportation networks, social networks, and the World Wide Web, to name a few. Spectral graph theory provides a set of useful techniques and models for understanding `patterns of interconnectedness’ in a graph. Our prime focus in this paper is on the following question: Is there a unified explanation and description of the fundamental spectral graph methods? There are at least two reasons to be interested in this question. Firstly, to gain a much deeper and refined understanding of the basic foundational principles, and secondly, to derive rich consequences with practical significance for algorithm design. However, despite half a century of research, this question remains one of the most formidable open issues, if not the core problem in modern network science. The achievement of this paper is to take a step towards answering this question by discovering a simple, yet universal statistical logic of spectral graph analysis. The prescribed viewpoint appears to be good enough to accommodate almost all existing spectral graph techniques as a consequence of just one single formalism and algorithm. …

Rule-Embedded Neural Network (ReNN) google
The artificial neural network shows powerful ability of inference, but it is still criticized for lack of interpretability and prerequisite needs of big dataset. This paper proposes the Rule-embedded Neural Network (ReNN) to overcome the shortages. ReNN first makes local-based inferences to detect local patterns, and then uses rules based on domain knowledge about the local patterns to generate rule-modulated map. After that, ReNN makes global-based inferences that synthesizes the local patterns and the rule-modulated map. To solve the optimization problem caused by rules, we use a two-stage optimization strategy to train the ReNN model. By introducing rules into ReNN, we can strengthen traditional neural networks with long-term dependencies which are difficult to learn with limited empirical dataset, thus improving inference accuracy. The complexity of neural networks can be reduced since long-term dependencies are not modeled with neural connections, and thus the amount of data needed to optimize the neural networks can be reduced. Besides, inferences from ReNN can be analyzed with both local patterns and rules, and thus have better interpretability. In this paper, ReNN has been validated with a time-series detection problem. …

High-Resolution Deep Convolutional Generative Adversarial Network (HR-DCGAN) google
Generative Adversarial Networks (GANs) convergence in a high-resolution setting with a computational constrain of GPU memory capacity (from 12GB to 24 GB) has been beset with difficulty due to the known lack of convergence rate stability. In order to boost network convergence of DCGAN (Deep Convolutional Generative Adversarial Networks) and achieve good-looking high-resolution results we propose a new layered network structure, HR-DCGAN, that incorporates current state-of-the-art techniques for this effect. …

Continue Reading…


Read More

Jobs: PhD scholarship on Algorithms for Event-Driven Camera Analysis at Western Sydney University, Australia

** Nuit Blanche is now on Twitter: @NuitBlog **

Paul Hurley just let me know of the following PhD scholarship

Hi Igor -- I don't know if you still put jobs/PhD scholarships on nuit blanche, but if you still do, would you mind sharing mine? It's an opportunity to build up foundational work for event-based cameras.
Sure Paul ! Here is how the ad starts:

SCEM: Algorithms for Event-Driven Camera Analysis
School of Computing, Engineering and Mathematics
Scholarship code: 2019-089 
About the project
Event-driven cameras are exciting technology that do not acquire full images like traditional cameras, but record only intensity changes when they occur. The International Centre for Neuromorphic Systems at Western Sydney University has been adapting them to perform Neuromorphic space imaging. 
This PhD scholarship builds on this work to help develop the correct abstraction and a theory so as to improve knowledge extraction algorithms. It goes from modelling to algorithm testing using real data, working together with a world-class team.

What does the scholarship provide?
  • Domestic candidates will receive a tax-free stipend of $30,000(AUD) per annum for up to 3 years to support living costs, supported by the Research Training Program (RTP) Fee Offset.
  • International candidates will receive a tax-free stipend of $30,000(AUD) per annum for up to 3 years to support living costs. Those with a strong track record will be eligible for a tuition fee waiver.
  • Support for conference attendance, fieldwork and additional costs as approved by the School.
International candidates are required to hold an Overseas Student Health Care (OSHC)(opens in new window)insurance policy for the duration their study in Australia. This cost is not covered by the scholarship.

Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Continue Reading…


Read More

How (Not) To Scale Deep Learning in 6 Easy Steps

Introduction: The Problem

Deep learning sometimes seems like sorcery. Its state-of-the-art applications are at times delightful and at times disturbing. The tools that achieve these results are, amazingly, mostly open source, and can work their magic on powerful hardware available to rent by the hour in the cloud.

It’s no wonder that companies are eager to apply deep learning for more prosaic business problems like better churn prediction, image curation, chatbots, time series analysis and more. Just because the tools are readily available doesn’t mean they’re easy to use well. Even choosing the right architecture, layers and activations is more art than science.

This blog won’t examine how to tune a deep learning architecture for accuracy. That process does, however, require training lots of models in a process of trial and error. This leads to a more immediate issue: scaling up the performance of deep learning training.

Tuning deep learning training doesn’t work like tuning an ETL job. It requires a large amount of compute from specialized hardware, and everyone eventually finds deep learning training ‘too slow’. Too often, users reach for solutions that may be overkill, expensive and not faster, when trying to scale up, while overlooking some basic errors that hurt performance.

This blog will instead walk through basic steps to avoid common performance pitfalls in training, and then the right steps, in order, to scale up by applying more complex tooling and more hardware. Hopefully, you will find your modeling job can move along much faster without reaching immediately for a cluster of extra GPUs.

A Simple Classification Task

Because the focus here is not on the learning problem per se, the following examples will develop a simple data set and problem to solve: classifying the Caltech 256 dataset of about 30,000 images each into one of 257 (yes, 257) categories.

The data consists of JPEG files. These need to be resized to common dimensions, 299×299, to match the pre-trained base layer described below. The images are then written to Parquet files with labels to facilitate larger-scale training, described later. This can be accomplished with the ‘binary’ files data source in Apache Spark. See the accompanying notebook for full source code, but these are the highlights:

img_size = 299
def scale_image(image_bytes):
  image ='RGB')
  image.thumbnail((img_size, img_size), Image.ANTIALIAS)
  x, y = image.size
  with_bg ='RGB', (img_size, img_size), (255, 255, 255))
  with_bg.paste(image, box=((img_size - x) // 2, (img_size - y) // 2))
  return with_bg.tobytes()


raw_image_df ="binaryFile").\
  option("pathGlobFilter", "*.jpg").option("recursiveFileLookup", "true").\
image_df =
(train_image_df, test_image_df) = image_df.randomSplit([0.9, 0.1], seed=42)


train_image_df.write.option("parquet.block.size", 1024 * 1024).\
  parquet(table_path_base + "train")
test_image_df.write.option("parquet.block.size", 1024 * 1024).\
  parquet(table_path_base + "test")

It’s also possible to use Spark’s built-in ‘image’ data source type to read these as well.

Keras, the popular high-level front end for Tensorflow, can describe a straightforward deep learning model to classify the images. There’s no need to build an image classifier from scratch. Instead, this example reuses the pretrained Xception model built into Keras and adds a dense layer on top to classify. (Note that this example uses Keras as included with Tensorflow 1.13.1, in tensorflow.keras, rather than standalone Keras 2.2.4). The pretrained layers themselves will not be trained further. Take that as step #0: use transfer learning and pretrained models when working with images!

Step #1: Use a GPU

Almost the only situation where it makes sense to train a deep learning model on a CPU is when there are no GPUs available. When working in the cloud, on a platform like Databricks, it’s trivial to provision a machine with a GPU with all the drivers and libraries ready. This example will jump straight into training this model on a single K80 GPU.

This first pass will just load a 10% sample of the data from Parquet as a pandas DataFrame, reshape the image data, and train in memory on 90% of that sample. Here, training just runs for 60 epochs on a small batch size. Small side tip: when using a pretrained network, it’s essential to normalize the image values to the range the network expects. Here, that’s [-1,1], and Keras provides a preprocess_input function to do this.

(Note: to run this example on Databricks, select the 5.5 ML Runtime or later with GPU support, and choose a driver instance type with a single GPU. Because the example also uses Spark, you will have to also provision 1 worker.)

df_pd ="...").sample(0.1, seed=42).toPandas()

X_raw = df_pd["image"].values
X = np.array(
    np.frombuffer(X_raw[i], dtype=np.uint8).reshape((img_size,img_size,3)))
   for i in range(len(X_raw))])
y = df_pd["label"].values - 1 # -1 because labels are 1-based
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)


def build_model(dropout=None):
  model = Sequential()
  xception = Xception(include_top=False,
    input_shape=(img_size,img_size,3), pooling='avg')
  for layer in xception.layers:
    layer.trainable = False
  if dropout:
  model.add(Dense(257, activation='softmax'))
  return model

model = build_model()
  loss='sparse_categorical_crossentropy', metrics=['accuracy']), y_train, batch_size=2, epochs=60, verbose=2)
model.evaluate(X_test, y_test)


Epoch 58/60
 - 65s - loss: 0.2787 - acc: 0.9280
Epoch 59/60
 - 65s - loss: 0.3425 - acc: 0.9106
Epoch 60/60
 - 65s - loss: 0.3525 - acc: 0.9173

[1.913768016828665, 0.7597173]

The results look good — 91.7% accuracy! However, there’s an important flaw. The final evaluation on the held-out 10% validation data shows that true accuracy is more like 76%. Actually, the model has overfitted. That’s not good, but worse, it means that most of the time it spent training was spent making it a little worse. It should have ended when accuracy on the validation data stopped decreasing. Not only would that have left a better model, it would have completed faster.

Step #2: Use Early Stopping

Keras (and other frameworks) have built-in support for stopping when further training appears to be making the model worse. In Keras, it’s the EarlyStopping callback. Using it means passing the validation data to the training process for evaluation on every epoch. Training will stop after several epochs have passed with no improvement. restore_best_weights=True ensures that the final model’s weights are from its best epoch, not just the last one. This should be your default.

early_stopping = EarlyStopping(patience=3, monitor='val_acc',
  min_delta=0.001, restore_best_weights=True, verbose=1), y_train, batch_size=2, epochs=60, verbose=2, 
  validation_data=(X_test, y_test), callbacks=[early_stopping])
model.evaluate(X_test, y_test)


Epoch 12/60
 - 74s - loss: 0.9468 - acc: 0.7689 - val_loss: 1.2728 - val_acc: 0.7597
Epoch 13/60
 - 75s - loss: 0.8886 - acc: 0.7795 - val_loss: 1.4035 - val_acc: 0.7456
Epoch 14/60
Restoring model weights from the end of the best epoch.
 - 80s - loss: 0.8391 - acc: 0.7870 - val_loss: 1.4467 - val_acc: 0.7420
Epoch 00014: early stopping


[1.3035458562230895, 0.7597173]

Now, training stops in 14 epochs, not 60, and 18 minutes. Each epoch took a little longer (75s vs 65s) because of the evaluation of the validation data. Accuracy is better too, at 76.7%.

With early stopping, note that the number of epochs passed to fit() only matters as a limit on the maximum number of epochs that will run. It can be set to a large value. This is the first a couple observations here that suggest the same thing: epochs don’t really matter as a unit of training. They’re just a number of batches of data that constitute the whole input to training. But training means passing over the data in batches repeatedly until the model is trained enough. How many epochs that represents isn’t directly important. An epoch is still useful as a point of comparison for time taken to train per amount of data though.

Step #3: Max Out GPU with Larger Batch Sizes

In Databricks, cluster metrics are exposed through a Ganglia-based UI. This shows GPU utilization during training. Monitoring utilization is important to tuning as it can suggest bottlenecks. Here, the GPU is pretty well used at about 90%:

100% is cooler than 90%. The batch size of 2 is small, and isn’t keeping the GPU busy enough during processing. Increasing the batch size would increase that utilization. The goal isn’t only to make the GPU busier, but to benefit from the extra work. Bigger batches improve how well each batch updates the model (up to a point) with more accurate gradients. That in turn can allow training to use a higher learning rate, and more quickly reach the point where the model stops improving.

Or, with extra capacity, it’s possible to add complexity to the network architecture itself to take advantage of that. This example doesn’t intend to explore tuning the architecture, but will try adding some dropout to decrease this network’s tendency to overfit.

model = build_model(dropout=0.5)
  loss='sparse_categorical_crossentropy', metrics=['accuracy']), y_train, batch_size=16, epochs=30, verbose=2, 
  validation_data=(X_test, y_test), callbacks=[early_stopping])


Epoch 6/30
 - 56s - loss: 0.1487 - acc: 0.9583 - val_loss: 1.1105 - val_acc: 0.7633

Epoch 7/30
 - 56s - loss: 0.1022 - acc: 0.9717 - val_loss: 1.2128 - val_acc: 0.7456

Epoch 8/30
 - 56s - loss: 0.0853 - acc: 0.9744 - val_loss: 1.2004 - val_acc: 0.7597

Epoch 9/30
Restoring model weights from the end of the best epoch.
 - 62s - loss: 0.0811 - acc: 0.9815 - val_loss: 1.2424 - val_acc: 0.7350

Epoch 00009: early stopping

With a larger batch size of 16 instead of 2, and learning rate of 0.004 instead of 0.001, the GPU crunches through epochs in under 60s instead of 75s. The model reaches about the same accuracy (76.3%) in only 9 epochs. Total train time was just 9 minutes, much better than 65.

It’s all too easy to increase the learning rate too far, in which case training accuracy will be poor and stay poor. When increasing the batch size by 8x, it’s typically advisable to increase learning rate by at most 8x. Some research suggests that when the batch size increases by N, the learning rate can scale by about sqrt(N).

Note that there is some randomness inherent in the training process, as inputs are shuffled by Keras. Accuracy fluctuates mostly up but sometimes down over time, and coupled with early stopping, training might terminate earlier or later depending on the order the data is encountered. To even this out, the ‘patience’ of EarlyStopping can be increased at the cost of extra training at the end.

Step #4: Use Petastorm and /dbfs/ml to Access Large Data

Training above used just a 10% sample of the data, and the tips above helped bring training time down by adopting a few best practices. The next step, of course, is to train on all of the data. This should help achieve higher accuracy, but means more data will have to be processed too. The full data set is many gigabytes, which could still fit in memory, but for purposes here, let’s pretend it wouldn’t. Data needs to be loaded efficiently in chunks into memory during training with a different approach.

Fortunately, the Petastorm library from Uber is designed to feed Parquet-based data into Tensorflow (or Keras) training in this way. It can be applied by adapting the preprocessing and training code to create Tensorflow Datasets, rather than pandas DataFrames, for training. Datasets here act like infinite iterators over the data, which means steps_per_epoch is now defined to specify how many batches make an epoch. This underscores how an ‘epoch’ is somewhat arbitrary.

It’s also common to checkpoint model training progress in long-running training jobs, to recover from failures during training. This is also added as a callback.

(Note: To run this example, attach the petastorm library to your cluster.)

path_base = "/dbfs/.../"
checkpoint_path = path_base + "checkpoint"
table_path_base = path_base + "caltech_256_image/"
table_path_base_file = "file:" + table_path_base

train_size = + "train").count()
test_size = + "test").count()

# Workaround for Arrow issue:
underscore_files = [f for f in (os.listdir(table_path_base + "train") + 
  os.listdir(table_path_base + "test")) if f.startswith("_")]

img_size = 299

def transform_reader(reader, batch_size):
  def transform_input(x):
    img_bytes = tf.reshape(decode_raw(x.image, tf.uint8), (-1,img_size,img_size,3))
    inputs = preprocess_input(tf.cast(img_bytes, tf.float32))
    outputs = x.label - 1
    return (inputs, outputs)
  return make_petastorm_dataset(reader).map(transform_input).\
    apply(unbatch()).shuffle(400, seed=42).\
    batch(batch_size, drop_remainder=True)

The method above reimplements some of the preprocessing from earlier code in terms of Tensorflow’s transformation APIs. Note that Petastorm produces Datasets that deliver data in batches that depends entirely on the Parquet files’ row group size. To control the batch size for training, it’s necessary to use Tensorflow’s unbatch() and batch() operations to re-batch the data into the right size. Also, note the small workaround that’s currently necessary to avoid a problem in reading Parquet files via Arrow in Petastorm.

batch_size = 16

with make_batch_reader(table_path_base_file + "train", num_epochs=None) as train_reader:
  with make_batch_reader(table_path_base_file + "test", num_epochs=None) as test_reader:
    train_dataset = transform_reader(train_reader, batch_size)
    test_dataset = transform_reader(test_reader, batch_size)

    model = build_model(dropout=0.5)
      loss='sparse_categorical_crossentropy', metrics=['acc'])

    early_stopping = EarlyStopping(patience=3, monitor='val_acc',
      min_delta=0.001, restore_best_weights=True, verbose=1)
    # Note: you must set save_weights_only=True to avoid problems with hdf5 files and /dbfs/ml
    checkpoint = ModelCheckpoint(checkpoint_path + "/checkpoint-{epoch}.ckpt", save_weights_only=True, verbose=1), epochs=30, steps_per_epoch=(train_size // batch_size),
              validation_data=test_dataset, validation_steps=(test_size // batch_size),
              verbose=2, callbacks=[early_stopping, checkpoint])

More asides: for technical reasons, currently ModelCheckpoint must set save_weights_only=True when using /dbfs. It also appears necessary to use different checkpoint paths per epoch; use a path pattern that includes {epoch}. Now run:

Epoch 8/30
Epoch 00008: saving model to /dbfs/tmp/sean.owen/binary/checkpoint/checkpoint-8.ckpt
 - 682s - loss: 1.0154 - acc: 0.8336 - val_loss: 1.2391 - val_acc: 0.8301
Epoch 9/30
Epoch 00009: saving model to /dbfs/tmp/sean.owen/binary/checkpoint/checkpoint-9.ckpt.
 - 684s - loss: 1.0048 - acc: 0.8397 - val_loss: 1.2900 - val_acc: 0.8275
Epoch 10/30
Epoch 00010: saving model to /dbfs/tmp/sean.owen/binary/checkpoint/checkpoint-10.ckpt
 - 689s - loss: 1.0033 - acc: 0.8422 - val_loss: 1.3706 - val_acc: 0.8225
Epoch 11/30
Restoring model weights from the end of the best epoch.
Epoch 00011: saving model to /dbfs/tmp/sean.owen/binary/checkpoint/checkpoint-11.ckpt
 - 687s - loss: 0.9800 - acc: 0.8503 - val_loss: 1.3837 - val_acc: 0.8225
Epoch 00011: early stopping

Epoch times are almost 11x longer, but recall that an epoch here is now a full pass over the training data, not a 10% sample. The extra overhead comes from the I/O in reading data from Parquet in cloud storage, and writing checkpoint files. The GPU utilization graph manifests this in “spiky” utilization of the GPU:

The upside? Accuracy is significantly better at 83%. The cost was much longer training time: 126 minutes instead of 9. For many applications, this could be well worth it.

Databricks provides an optimized implementation of the file system mount that makes the Parquet files appear as local files to training. Accessing them via /dbfs/ml/… instead of /dbfs/… can improve I/O performance. Also, Petastorm itself can cache data on local disks to avoid re-reading data from cloud storage.

path_base = "/dbfs/ml/..."
checkpoint_path = path_base + "checkpoint"
table_path_base = path_base + "caltech_256_image/"
table_path_base_file = "file:" + table_path_base

def make_caching_reader(suffix, cur_shard=None, shard_count=None):
return make_batch_reader(table_path_base_file + suffix, num_epochs=None,
cur_shard=cur_shard, shard_count=shard_count,
cache_type='local-disk', cache_location="/tmp/" + suffix,
cache_row_size_estimate=img_size * img_size * 3)

The rest of the code is as above, just using make_caching_reader in place of make_reader.

Epoch 6/30
Epoch 00006: saving model to /dbfs/ml/tmp/sean.owen/binary/checkpoint/checkpoint-6.ckpt
- 638s - loss: 1.0221 - acc: 0.8252 - val_loss: 1.1612 - val_acc: 0.8285
Epoch 00009: early stopping


The training time decreased from about 126 minutes to 96 minutes for roughly the same result. That’s still more than 10x the runtime for 10x the data, but not bad for a 7% increase in accuracy.

Step #5: Use Multiple GPUs

Still want to go faster, and have some budget? It’s easy to try a bigger GPU like a V100 and retune appropriately. However, at some point, scaling up means multiple GPUs. Instances with, for example, eight K80 GPUs are readily available in the cloud. Keras provides a simple utility function called multi_gpu_model that can parallelize training across multiple GPUs. It’s just a one-line code change:

num_gpus = 8
model = multi_gpu_model(model, gpus=num_gpus)

(Note: to run this example, choose a driver instance type with 8 GPUs.)

The modification was easy, but, to cut to the chase without repeating the training output: per-epoch time becomes 270s instead of 630s. That’s not 8x faster, not even 3x faster. Each of the 8 GPUs is only processing 1/8th of each batch of 16 inputs, so each is again effectively processing just 2 per batch. As above, it’s possible to increase the batch size by 8x to compensate, to 256, and further increase the learning rate to 0.016. (See the accompanying notebook for full code listings.)

It reveals that training is faster, at 135s per epoch. The speedup is better, but still not 8x. Accuracy is steady at around 83%, so this still progresses towards faster training. The Keras implementation is simple, but not optimal. GPU utilization remains spiky because the GPUs idle while Keras combines partial gradients in a straightforward but slow way.

Horovod is another project from Uber that helps scale deep learning training across not just multiple GPUs on one machine, but GPUs across many machines, and with great efficiency. While it’s often associated with training across multiple machines, that’s not actually the next step in scaling up. It can help this current multi-GPU setup. All else equal, it’ll be more efficient to utilize 8 GPUs connected to the same VM than spread across the network.

It requires a different modification to the code, which uses the HorovodRunner utility from Databricks to integrate Horovod with Spark:

batch_size = 32
num_gpus = 8

def train_hvd():


config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.visible_device_list = str(hvd.local_rank())


with make_caching_reader("train", cur_shard=hvd.rank(), shard_count=hvd.size()) as train_reader:
with make_caching_reader("test", cur_shard=hvd.rank(), shard_count=hvd.size()) as test_reader:
train_dataset = transform_reader(train_reader, batch_size)
test_dataset = transform_reader(test_reader, batch_size)

model = build_model(dropout=0.5)

optimizer = Nadam(lr=0.016)
optimizer = hvd.DistributedOptimizer(optimizer)

loss='sparse_categorical_crossentropy', metrics=['acc'])

callbacks = [hvd.callbacks.BroadcastGlobalVariablesCallback(0),
EarlyStopping(patience=3, monitor='val_acc',
min_delta=0.001, restore_best_weights=True,
verbose=(1 if hvd.rank() == 0 else 0))]

if hvd.rank() == 0:
checkpoint_path + "/checkpoint-{epoch}.ckpt",
save_weights_only=True, verbose=1)), epochs=30,
steps_per_epoch=(train_size // (batch_size * num_gpus)),
validation_steps=(test_size // (batch_size * num_gpus)),
verbose=(2 if hvd.rank() == 0 else 0), callbacks=callbacks)

hr = HorovodRunner(np=-num_gpus)

Again a few notes:

  • The Arrow workaround must be repeated in the Horovod training function
  • Use hvd.callbacks.MetricAverageCallback to correctly average validation metrics
  • Make sure to only run checkpoint callbacks on one worker (rank 0)
  • Set HorovodRunner’s np= argument to minus the number of GPUs to use, when local
  • Batch size here is now per GPU, not overall. Note the different computation in steps_per_epoch

The output from the training is, well, noisy and so won’t be copied here in full. Total training time has come down to about 12.6 minutes, from 96, or almost 7.6x, which is satisfyingly close to the maximum possible 8x speedup! Accuracy is up to 83.5%. Compare to 9 minutes and 76% accuracy on one GPU.

Step #6: Use Horovod Across Multiple Machines

Sometimes, 8 or even 16 GPUs just isn’t enough, and that’s the most you can get on one machine today. Or, sometimes it can be cheaper to provision GPUs across many smaller machines to take advantage of varying prices per machine type in the cloud. The same Horovod example above can run on a cluster of 8 1-GPU machines instead of 1 8-GPU machine with just a single line of change. HorovodRunner manages the distributed work of Horovod on the Spark cluster by using Spark 2.4’s barrier mode support.

num_gpus = 8
hr = HorovodRunner(np=num_gpus)

(Note: to run this example, provision a cluster with 8 workers, each with 1 GPU.)

The only change is to specify 8, rather than -8, to select 8 GPUs on the cluster rather than on the driver. GPU utilization is pleasingly full across 8 machines’ GPUs (the idle one is the driver, which does not participate in the training):

Accuracy is again about the same as expected, at 83.6%. Total run time is almost 17 minutes rather than 12.6, which reflects the overhead of coordinating GPUs across machines. This overhead could be worthwhile in some cases for cost purposes, and is simply a necessary evil if a training job has to scale past 16 GPUs. Where possible, allocating all the GPUs on one machine is faster though.

For a problem of this moderate size, it probably won’t be possible to usefully exploit more GPU resources. Keeping them busy would mean larger learning rates and the learning rate is already about as high as it can go. For this network, a few K80 GPUs may be the right maximum amount of resource to deploy. Of course, there are much larger networks and datasets out there!


Deep learning is powerful magic, but we always want it to go faster. It scales in different ways though. There are new best practices and pitfalls to know when setting out to train a model. A few of these helped the small image classification problem here improve accuracy slightly while reducing runtime 7x. The first steps in scaling aren’t more resources, but looking for easy optimizations.

Scaling to train on an entire large data set in the cloud requires some new tools, but not necessarily more GPUs at first. With careful use of Petastorm and /dbfs/ml, 10x the data helped achieve 82.7% accuracy is not much more than 10x the time on the same hardware.

The next step of scaling up means utilizing multiple GPUs with tools like Horovod, but doesn’t mean a cluster of machines necessarily, unlike in ETL jobs where a cluster of machines is the norm. A single 8 GPU instance allowed training to finish almost 8x faster and achieve over 83% accuracy. Only for the largest problems are multiple GPU instances necessary, but Horovod can help scale even there without much overhead.





Try Databricks for free. Get started today.

The post How (Not) To Scale Deep Learning in 6 Easy Steps appeared first on Databricks.

Continue Reading…


Read More

Magister Dixit

“The hope is that if we can start building the right models to find the right patterns using the right data, then maybe we can start making progress on some of these complicated systems.” Eric Jonas

Continue Reading…


Read More

Book Memo: “Centrality and Diversity in Search”

Roles in A.I., Machine Learning, Social Networks, and Pattern Recognition
The concepts of centrality and diversity are highly important in search algorithms, and play central roles in applications of artificial intelligence (AI), machine learning (ML), social networks, and pattern recognition. This work examines the significance of centrality and diversity in representation, regression, ranking, clustering, optimization, and classification. The text is designed to be accessible to a broad readership. Requiring only a basic background in undergraduate-level mathematics, the work is suitable for senior undergraduate and graduate students, as well as researchers working in machine learning, data mining, social networks, and pattern recognition.

Continue Reading…


Read More

Really large numbers in R

[This article was first published on R – Open Source Automation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

gpm package in r

This post will discuss ways of handling huge numbers in R using the gmp package.

The gmp package

The gmp package provides us a way of dealing with really large numbers in R. For example, let’s suppose we want to multiple 10250 by itself. Mathematically we know the result should be 10500. But if we try this calculation in base R we get Inf for infinity.

num = 10^250

num^2 # Inf

However, we can get around this using the gmp package. Here, we can convert the integer 10 to an object of the bigz class. This is an implementation that allows us to handle very large numbers. Once we convert an integer to a bigz object, we can use it to perform calculations with regular numbers in R (there’s a small caveat coming).


num = as.bigz(10)

(num^250) * (num^250)

# or directly 10^500

gmp big integers

One note that we need to be careful about is what numbers we use to convert to bigz objects. In the example above, we convert the integer 10 to bigz. This works fine for our calculations because 10 is not a very large number in itself. However, let’s suppose we had converted 10250 to a bigz object instead. If we do this, the number 10250 becomes a double data type, which causes a loss in precision for such a number. Thus the result we see below isn’t really 10250:

num = 10^250



double losing precision in r

A way around this is to input the number we want as a character into as.bigz. For example, we know that 10250 is the number 1 followed by 250 zeros. We can create a character that represents this number like below:

num = paste0("1", paste(rep("0", 250), collapse = ""))

Thus, we can use this idea to create bigz objects:


bigz on a character

In case you run into issues with the above line returning an NA value, you might want to try turning scientific notation off. You can do that using the base options command.

options(scipen = 999)

If scientific notation is not turned off, you may have cases where the character version of the number looks like below, which results in an NA being returned by as.bigz.


In general, numbers can be input to gmp functions as characters to avoid this or other precision issues.

Finding the next prime

The gmp package can find the first prime larger than an input number using the nextprime function.

num = "100000000000000000000000000000000000000000000000000"


find next prime number in r

Find the GCD of two huge numbers

We can find the GCD of two large numbers using the gcd function:

num = "2452345345234123123178"
num2 = "23459023850983290589042"

gcd(num, num2) # returns 2

Factoring numbers into primes

gmp also provides a way to factor numbers into primes. We can do this using the factorize function.

num = "2452345345234123123178"


factorize large numbers in r

Matrices of large numbers

gmp also supports creating matrices with bigz objects.

num1 <- "1000000000000000000000000000"
num2 <- "10000000000000000000000000000000"
num3 <- "100000000000000000000000000000000000000"
num4 <- "100000000000000000000000000000000000000000000000"

nums <- c(as.bigz(num1), as.bigz(num2), as.bigz(num3), as.bigz(num4))

matrix(nums, nrow = 2)

matrix large numbers in r

We can also perform typical operations with our matrix, like find its inverse, using base R functions:


gmp inverse of matrix in r

Sampling random (large) numbers uniformly

We can sample large numbers from a discrete uniform distribution using the urand.bigz function.

urand.bigz(nb = 100, size = 5000, seed = 0)

The nb parameter represents how many integers we want to sample. Thus, in this example, we’ll get 100 integers returned. size = 5000 tells the function to sample the integers from the inclusive range of 0 to 25000 – 1. In general you can sample from the range 0 to 2size – 1.

To learn more about gmp, click here for its vignette.

If you enjoyed this post, click here to follow my blog on Twitter.

The post Really large numbers in R appeared first on Open Source Automation.

To leave a comment for the author, please follow the link and comment on their blog: R – Open Source Automation. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Converting lines in an svg image to csv

[This article was first published on The Shape of Code » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

During a search for data on programming language usage I discovered Stack Overflow Trends, showing an interesting plot of language tags appearing on Stack Overflow questions (see below). Where was the csv file for these numbers? Somebody had asked this question last year, but there were no answers.

Stack Overflow language tag trends over time.

The graphic is in svg format; has anybody written an svg to csv conversion tool? I could only find conversion tools for specialist uses, e.g., geographical data processing. The svg file format is all xml, and using a text editor I could see the numbers I was after. How hard could it be (it had to be easier than a png heatmap)?

Extracting the x/y coordinates of the line segments for each language turned out to be straight forward (after some trial and error). The svg generation process made matching language to line trivial; the language name was included as an xml attribute.

Programmatically extracting the x/y axis information exhausted my patience, and I hard coded the numbers (code+data). The process involves walking an xml structure and R’s list processing, two pet hates of mine (the data is for a book that uses R, so I try to do everything data related in R).

I used R’s xml2 package to read the svg files. Perhaps if my mind had a better fit to xml and R lists, I would have been able to do everything using just the functions in this package. My aim was always to get far enough down to convert the subtree to a data frame.

Extracting data from graphs represented in svg files is so easy (says he). Where is the wonderful conversion tool that my search failed to locate? Pointers welcome.

To leave a comment for the author, please follow the link and comment on their blog: The Shape of Code » R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

R Packages worth a look

Extension to ‘spatstat’ for Large Datasets on a Linear Network (spatstat.Knet)
Extension to the ‘spatstat’ package, for analysing large datasets of spatial points on a network. Provides a memory-efficient algorithm for computing the geometrically-corrected K function, described in S. Rakshit, A. Baddeley and G. Nair (2019) <doi:10.18637/jss.v090.i01>

Create Trusted Timestamps of Datasets and Files (trustedtimestamping)
Trusted Timestamps (tts) are created by incorporating a hash of a file or dataset into a transaction on the decentralized blockchain (Stellar network). The package makes use of a free service provided by <>.

Manage the Life Cycle of your Package Functions (lifecycle)
Manage the life cycle of your exported functions with shared conventions, documentation badges, and non-invasive deprecation warnings. The ‘lifecycle’ package defines four development stages (experimental, maturing, stable, and questioning) and three deprecation stages (soft-deprecated, deprecated, and defunct). It makes it easy to insert badges corresponding to these stages in your documentation. Usage of deprecated functions are signalled with increasing levels of non-invasive verbosity.

Markov Random Field Structure Estimator (mrfse)
A Markov random field structure estimator that uses a penalized maximum conditional likelihood method similar to the Bayesian Information Criterion (Frondana, 2016) <doi:10.11606/T.45.2018.tde-02022018-151123>.

Model Butcher (butcher)
Provides a set of five S3 generics to axe components of fitted model objects and help reduce the size of model objects saved to disk.

Parsing Semi-Structured Log Files into Tabular Format (tabulog)
Convert semi-structured log files (such as ‘Apache’ access.log files) into a tabular format (data.frame) using a standard template system.

Robust Quality Control Chart (rQCC)
Constructs robust quality control chart based on the median and Hodges-Lehmann estimators (location) and the median absolute deviation (MAD) and Shamos estimators (scale) which are unbiased with a sample of finite size. For more details, see Park, Kim and Wang (2019)<arXiv:1908.00462>. This work was partially supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. NRF-2017R1A2B4004169).

Continue Reading…


Read More

Legal weed is linked to higher junk-food sales

Research suggests marijuana really does give you the munchies

Continue Reading…


Read More

August 15, 2019

Distilled News

Metrics to Evaluate your Semantic Segmentation Model

Semantic segmentation. My absolute favorite task. (More than NLP you ask? Yes.) I would make a deep learning model, have it all nice and trained… but wait. How do I know my model is performing well? In other words, what are the most common metrics for semantic segmentation? Here’s a clear cut guide to the essential metrics that you need to know to ensure your model is ?? ??. I have also included Keras implementations below.

Using parallelization, multiple git repositories and setting permissions when automating R applications with Jenkins

In this post, we look at various tips that can be useful when automating R application testing and continuous integration, with regards to orchestrating parallelization, combining sources from multiple git repositories and ensuring proper access right to the Jenkins agent.

Python Data Transformation Tools for ETL

The other day, I went on Reddit to ask if I should use Python for ETL related transformations, and the overwhelming response was yes. However, while my fellow Redditors enthusiastically supported using Python, they advised looking into libraries outside of Pandas – citing concerns about Pandas performance with large datasets. After doing some research, I found a ton of Python libraries built for data transformation: some improve Pandas performance, while others offer their own solutions. I couldn’t find a comprehensive list of these tools, so I thought I’d compile one using the research I did – if I missed something or got something wrong, please let me know!

Anomaly detection in Martian Surface

I got this wonderful opportunity to work on the project ‘Anomaly detection in Martian Surface’ though Omdena community. The objective of this project is to detect the Anomalies on the martian (MARS) surface caused by non-terrestrial artifacts like derbies of MARS lander missions, rovers, etc. Recently the search for so-called ‘Techno-Signatures’ – measurable properties or effect that provide scientific evidence of past or present extraterrestrial technology, has gained new interests. NASA hosted a ‘Techno-Signature’ Workshop at the Lunar and Planetary Institute in Houston, Texas, on September 2018 to learn more about the current field and state of the art of searches for ‘Techno-Signatures’, and what role NASA might play in these searches in the future. One area in this field of research is the search for non-terrestrial artifacts in the Solar System. This AI challenge is aimed at developing ‘AI Toolbox’ for the Planetary Scientists to help in identifying non-terrestrial artifacts.

Data Augmentation for Deep Learning

Having a large dataset is crucial for the performance of the deep learning model. However, we can improve the performance of the model by augmenting the data we already have. Deep learning frameworks usually have built-in data augmentation utilities, but those can be inefficient or lacking some required functionality. In this article, I would like to make an overview of most popular image augmentation packages, designed specifically for machine learning, and demonstrate how to use these packages with PyTorch framework.

How to run RStudio on AWS in under 3 minutes for free

When it comes to data analytics there are my reasons to move from your local computer to the cloud. Most prominently, you can run an indefinite number of machines without needing to own or maintain them. Furthermore, you can scale up and down as you wish in a matter of minutes. And if you choose to run t2.micro servers you can run for 750 hours a month for free within the first 12 months! After that it’s a couple of bucks per month and server. Alright, let’s get to it then! Understandably you won’t have time to read a ten minute article about RStudio Sever and Amazon Web Services after clicking a title that promised you a solution in 3 minutes. So I skip the formal introduction and cut to the chase.

Uber’s Ludwig v0.2: New features and Improvements

Uber released a new version of Ludwig with new features as well as some improvements to old once. If you don’t already know Uber’s Ludwig is a machine learning toolbox aimed at opening the world of machine learning to none-coders by providing a simple interface to create deep neural networks for lots of different applications. I already covered the basics of Uber’s Ludwig in two other articles. I released the first one right after the release of Uber’s Ludwig in February 2019. It covers the core principles and basics of Uber’s Ludwig. In the second article, I covered how to use Uber’s Ludwig for tabular, image and text data.

Version Control ML Model

Machine Learning operations (let’s call it mlOps under the current buzzword pattern xxOps) are quite different from traditional software development operations (devOps). One of the reasons is that ML experiments demand large dataset and model artifact besides code (small plain file). This post presents a solution to version control machine learning models with git and dvc (Data Version Control).

Analogies from Word Vectors?

Entry level articles on word vectors often contain examples of calculated analogies, such as king-man+woman=queen. Striking examples like this clearly have their place. They guide our interest towards similarity as one of the hidden treasures to delve for. In real data, however, analogies are often not so clear and easy to use.

Market Profile: a statistical view on financial markets

In the article, I have briefly presented Market Profile. I have covered why I think Market Profile is still relevant today and some reasoning on why I think in that way. I have also enumerated the three main classic books which cover the theory and a small excerpt of code on how to plot market profiles. A routine to get market profile is not presented because it is highly specific on how do you store your data, but in this example, a prototype in Python was build in just 50 lines. That is just one page of code.

To dance or not to dance? – The Machine Learning approach.

I love dancing! There, I said it. Even though I may not want to dance all the time, I do find myself often scrolling through my playlists in search of my most danceable songs. And here’s the thing, it has nothing to do with genres – at least not for me. But it has everything to do with the music.

A different take on Bayes Rule

Most people reading this article have seen demonstrations of each probability distribution in Bayes Rule. Most people reading this have been formally introduced to the terms ‘posterior’, ‘prior’, and ‘likelihood’. If not, even better! I think that viewing Bayes Rule as an incremental learning rule would be a novel perspective for many. Further, I believe this perspective would give much better intuition for why we use the terms ‘posterior’ and ‘prior’. Finally, I think this perspective helps explain why I don’t think Bayesian statistics lead to any more inductive bias than Frequentist statistics.

Modernize your IT Infrastructure Monitoring by Combining Time Series Databases with Machine Learning

Let’s explore the complexity and vulnerability of IT infrastructure and how to build a modern IT infrastructure monitoring solution, using a combination of time series databases with machine learning.

t-SNE Python Example

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensionality reduction technique used to represent high-dimensional dataset in a low-dimensional space of two or three dimensions so that we can visualize it. In contrast to other dimensionality reduction algorithms like PCA which simply maximizes the variance, t-SNE creates a reduced feature space where similar samples are modeled by nearby points and dissimilar samples are modeled by distant points with high probability. At a high level, t-SNE constructs a probability distribution for the high-dimensional samples in such a way that similar samples have a high likelihood of being picked while dissimilar points have an extremely small likelihood of being picked. Then, t-SNE defines a similar distribution for the points in the low-dimensional embedding. Finally, t-SNE minimizes the Kullback-Leibler divergence between the two distributions with respect to the locations of the points in the embedding.

ERNIE 2.0: A Continual Pre-training Framework for Language Understanding

Recently, pre-trained models have achieved state-of-the-art results in various language understanding tasks, which indicates that pre-training on large-scale corpora may play a crucial role in natural language processing. Current pre-training procedures usually focus on training the model with several simple tasks to grasp the co-occurrence of words or sentences. However, besides co-occurring, there exists other valuable lexical, syntactic and semantic information in training corpora, such as named entity, semantic closeness and discourse relations. In order to extract to the fullest extent, the lexical, syntactic and semantic information from training corpora, we propose a continual pre-training framework named ERNIE 2.0 which builds and learns incrementally pre-training tasks through constant multi-task learning. Experimental results demonstrate that ERNIE 2.0 outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several common tasks in Chinese.

Measurable Counterfactual Local Explanations for Any Classifier

We propose a novel method for explaining the predictions of any classifier. In our approach, local explanations are expected to explain both the outcome of a prediction and how that prediction would change if ‘things had been different’. Furthermore, we argue that satisfactory explanations cannot be dissociated from a notion and measure of fidelity, as advocated in the early days of neural networks’ knowledge extraction. We introduce a definition of fidelity to the underlying classifier for local explanation models which is based on distances to a target decision boundary. A system called CLEAR: Counterfactual Local Explanations via Regression, is introduced and evaluated. CLEAR generates w-counterfactual explanations that state minimum changes necessary to flip a prediction’s classification. CLEAR then builds local regression models, using the w-counterfactuals to measure and improve the fidelity of its regressions. By contrast, the popular LIME method [15], which also uses regression to generate local explanations, neither measures its own fidelity nor generates counterfactuals. CLEAR’s regressions are found to have significantly higher fidelity than LIME’s, averaging over 45% higher in this paper’s four case studies.

Continue Reading…


Read More

Insurance data science : Networks

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

At the Summer School of the Swiss Association of Actuaries, in Lausanne, I will start talking about networks and insurance this Friday. Slides are available online

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Facebook awards $100,000 to 2019 Internet Defense Prize winners

The Internet Defense Prize is a partnership between USENIX and Facebook that aims to reward security research that meaningfully makes the internet more secure.

This year, we are awarding $100,000 to Anjo Vahldiek-Oberwagner, Eslam Elnikety, Nuno O. Duarte, Michael Sammler, Peter Druschel, and Deepak Garg at the Max Planck Institute for Software Systems, Saarland Informatics Campus, for their work titled ERIM: Secure, Efficient In-process Isolation with Protection Keys (MPK).

This research demonstrates a new approach to isolating sensitive data within software, which can help prevent a number of security issues.

Traditionally, software isolation has come with significant performance costs. The authors’ approach stands out because it achieves much better runtime efficiency due to lower overhead, which makes it practical for real-world use in production environments. If this type of defense finds widespread use, it will help eliminate an entire class of security exploits.

We would like to congratulate the winners of the 2019 Internet Defense Prize for their contribution to defense-focused security research. In addition, we thank USENIX for their partnership and continued strong support of the Internet Defense Prize.

Photos have been used with the permission of USENIX.

The post Facebook awards $100,000 to 2019 Internet Defense Prize winners appeared first on Facebook Research.

Continue Reading…


Read More

Fun with progress bars: Fish, daggers and the Star Wars trench run

[This article was first published on R – Daniel Oehm | Gradient Descending, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you’re like me, when running a process through a loop you’ll add in counters and progress indicators. That way you’ll know if it will take 5 minutes or much longer. It’s also good for debugging to know when the code wigged-out.

This is typically what’s done. You take a time stamp at the start – start <- Sys.time(), print out some indicators at each iteration – cat(“iteration”, k, “// reading file”, file, “\n”) and print out how long it took at the end – print(Sys.time()-start). The problem is it will print out a new line at each time it is called, which is fine but ugly. You can reduce the number of lines printed by only printing out every 10th or 100th iteration e.g. if(k %% 10 == 0) ….

A simple way to make this better is instead of using "\n" for a new line use "\r" for carriage return. This will overwrite the same line which is much neater. It’s much more satisfying watching a number go up, or down, whichever way is the good direction. Try it out…

y <- matrix(0, nrow = 31, ncol = 5)
for(sim in 1:5){
  y[1, sim] <- rnorm(1, 0, 8)
  for(j in 1:30){
    y[j+1, sim] <- y[j, sim] + rnorm(1) # random walk
    cat("simulation", sim, "// time step", sprintf("%2.0f", j), "// random walk", sprintf(y[j+1, sim], fmt='% 6.2f'), "\r")

## simulation 5 // time step 30 // random walk   8.97

The best way is to use the {progress} package. This package allows you to simply add running time, eta, progress bars, percentage complete as well as custom counters to your code. First decide on what counters you want and the format of the string. The function identifies counters by using a colon at the beginning of the label. Check the doco for built-in tokens.

To add your own token add the label to the format string and add the token to tick(). To make it pretty I recommend formatting digits with sprintf(). Here’s an example.


pb <- progress_bar$new(format = ":elapsedfull // eta :eta // simulation :sim // time step :ts // random walk :y [:bar]", total = 30*5, clear = FALSE)
y <- matrix(0, nrow = 31, ncol = 5)
for(sim in 1:5){
  y[1, sim] <- rnorm(1, 0, 8)
  for(j in 1:30){
    y[j+1, sim] <- y[j, sim] + rnorm(1) # random walk
    pb$tick(tokens = list(sim = sim, ts = sprintf("%2.0f", j), y = sprintf(y[j+1, sim], fmt='% 6.2f')))

00:00:17 // eta  0s // simulation 5 // time step 30 // random walk -12.91 [====================================================]

You can also jazz it up with a bit of colour with {crayon}. Be careful with this, it doesn’t handle varying string lengths very well and can start a new line exploding your console.

pb <- progress_bar$new(format = green$bold(":elapsedfull // eta :eta // simulation :sim // time step :ts // random walk :y [:bar]"), total = 30*5, clear = FALSE)

00:00:17 // eta 0s // simulation 5 // time step 30 // random walk -12.91 [====================================================]

That’s a much neater progress bar.

But, I didn’t stop there…

Procrastination set in and creative tangents were followed. So, made a progress bar into a big fish which eats smaller fish … and made it green.

n <- 300
bar_fmt <- green$bold(":elapsedfull | :icon |")
pb <- progress_bar$new(format = bar_fmt, total = n, clear = FALSE)
icon <- progress_bar_icon("fish", n, 75)
for(j in 1:n){
  pb$tick(tokens = list(
    icon = token(icon, j)

Each fish represents 25% completion. Once they’re all gobbled up, the job is done.

I also threw knives at boxes. Each box represents 20% completion.

n <- 300
bar_fmt <- green$bold(":elapsedfull | :icon |")
pb <- progress_bar$new(format = bar_fmt, total = n, clear = FALSE)
icon <- progress_bar_icon("dagger", n, 75)
for(j in 1:n){
  pb$tick(tokens = list(
    icon = token(icon, j)

And my personal favourite, the Star Wars trench run.

n <- 500
bar_fmt <- green$bold(":elapsedfull | :icon |")
pb <- progress_bar$new(format = bar_fmt, total = n, clear = FALSE)
icon <- progress_bar_icon("tiefighter", n, 75)
for(j in 1:n){
  pb$tick(tokens = list(
    icon = token(icon, j)

Ok… I have spent way too long on this! But at least it was fun. If you want to play around with it, feel free to download it from Git.


The post Fun with progress bars: Fish, daggers and the Star Wars trench run appeared first on Daniel Oehm | Gradient Descending.

To leave a comment for the author, please follow the link and comment on their blog: R – Daniel Oehm | Gradient Descending. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Fresh from the Python Package Index

Robot for Automagica – Smart Robotic Process Automation

Expert Systems for Python

A Jupyter Notebook server extension which acts a proxy for the S3 API.

Python code for handling clusters of labeled, high-dimensional data

Library for brain modeling and machine learning in Python 3

Potree visualization for jypyter notebooks

Repository of pre-trained NLP Transformer models: BERT, GPT & GPT-2, Transformer-XL, XLNet and XLM

Execute logic program queries against a remote SPARQL endpoint

Tipboard – a flexible solution for creating your dashboards.

Automatically Build Variant Interpretable ML models fast!

Python Package for Uplift Modeling and Causal Inference with Machine Learning Algorithms

Universal configuration file parser

Data from R package completejourney

Data-quality framework

Command Line User Interface for finding pre-trained AI models

Continue Reading…


Read More

U. of Miami: Faculty Positions, with expertise in AI/Data Science/ML or related areas [Miami, FL]

The positions require research and teaching expertise in AI/Data Science, or related areas including Data Extraction, Data Visualization, Machine Learning, and Intelligent Actuators.

Continue Reading…


Read More

Document worth reading: “Network reconstruction with local partial correlation: comparative evaluation”

Over the past decade, various methods have been proposed for the reconstruction of networks modeled as Gaussian Graphical Models. In this work we analyzed three different approaches: the Graphical Lasso (GLasso), Graphical Ridge (GGMridge) and Local Partial Correlation (LPC). For the evaluation of the methods, we used high dimensional data generated from simulated random graphs (Erd\’os-R\’enyi, Barab\’asi-Albert, Watts-Strogatz). The performance was assessed through the Receiver Operating Characteristic (ROC) curve. In addition the methods were used for reconstruction of co-expression network, for differentially expressed genes in human cervical cancer data. LPC method outperformed the GLasso in most of the simulation cases, even though GGMridge produced better ROC curves then both other methods. LPC obtained similar outcomes as GGMridge in real data studies. Network reconstruction with local partial correlation: comparative evaluation

Continue Reading…


Read More

Data Driven Government – Speakers Highlights

The lineup of experienced, thought-leading speakers at Data Driven Government, Sep 25 in Washington, DC, will explain how to use data and analytics to more effectively accomplish your mission, increase efficiency, and improve evidence-based policymaking.

Continue Reading…


Read More

✚ Annotate Charts to Help Your Data Speak, Because the Data Has No Idea What It Is Doing (The Process #52)

This week, we talk annotation and how it can make your charts more readable and easier to understand. Read More

Continue Reading…


Read More

How Dataquest Helped Mohammad Become a Machine Learning Engineer

Learn how Mohammad went from zero background in data science to becoming a machine learning engineer with the help of Dataquest's data science courses.

The post How Dataquest Helped Mohammad Become a Machine Learning Engineer appeared first on Dataquest.

Continue Reading…


Read More

Why does my academic lab keep growing?

Andrew, Breck, and I are struggling with the Stan group funding at Columbia just like most small groups in academia. The short story is that to apply for enough grants to give us a decent chance of making payroll in the following year, we have to apply for so many that our expected amount of funding goes up. So our group keeps growing, putting even more pressure on us in the future to write more grants to make payroll. It’s a better kind of problem to have than firing people, but the snowball effect means a lot of work beyond what we’d like to be doing.

Why does my academic lab keep growing?

Here’s a simple analysis. For the sake of argument, let’s say your lab has a $1.5M annual budget. And to keep things simple, let’s suppose all grants are $0.5M. So you need three per year to keep the lab afloat. Let’s say you have a well-oiled grant machine with a 40% success rate on applications.

Now what happens if you apply for 8 grants? There’s roughly a 30% chance you get fewer than the 3 grants you need, a 30% chance you get exactly the 3 grants you need, and a 40% chance you get more grants than you need.

If you’re like us, a 30% chance of not making payroll is more than you’d like, so let’s say you apply for 10 grants. Now there’s only a 20% chance you won’t make payroll (still not great odds!), a 20% chance you get exactly 3 grants, and a whopping 60% chance you wind up with 4 or more grants.

The more conservative you are about making payroll, the bigger this problem is.

Wait and See?

It’s not quite as bad as that analysis leads one to believe, because once a lab’s rolling, it’s usually working in two-year chunks, not one-year chunks. But that takes a while to build up that critical mass.

It would be great if you could apply and wait and see before applying again, but it’s not so easy. Most government grants have fixed deadlines, typically once or at most twice per year. The ones like NIH that have two submission periods/year have a tendency to no fund first applications. So if you don’t apply in a cycle, it’s usually at least another year before you can apply again. Sometimes special one-time-only opportunities with partners or funding agencies come up. We also run into problems like government shutdowns—I still have two NSF grants under review that have been backed up forever (we’ve submitted and heard back on other grants from NSF in the meantime).

The situation with Stan at Columbia

We’ve received enough grants to keep us going. But we have a bunch more in process, some of which we’re cautiously optimistic about. And we’ve already received about half a grant more than we anticipated, so we’re going to have to hire even if we don’t get the ones in process.

So if you know any postdocs or others who might want to work on the Stan language in OCaml and C++, let me know ( A more formal job ad will be out out soon.

Continue Reading…


Read More

How Concerned Should You be About Predictor Collinearity? It Depends…

Predictor collinearity (also known as multicollinearity) can be problematic for your regression models. Check out these rules of thumb about when, and when not, to be concerned.

Continue Reading…


Read More

Jobs: 2 PhD and RA positions at University of Luxembourg

** Nuit Blanche is now on Twitter: @NuitBlog **

Kumar also sent me the following announcements for different positions:

Dear Igor,
I was wondering if you could post on Nuit-Blanche the announcement of the following Ph.D./R.A. positions at SnT, University of Luxembourg on signal processing for next-generation radar systems.

Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Continue Reading…


Read More

Violence in Afghanistan last year was worse than in Syria

As NATO draws down forces, the Taliban have reclaimed much of the country

Continue Reading…


Read More

Command Line Basics Every Data Scientist Should Know

Check out this introductory guide to completing simple tasks with the command line.

Continue Reading…


Read More

Replication police methodological terrorism stasi nudge shoot the messenger wtf

Cute quote:

(The link comes from Stuart Richie.) Sunstein later clarified:

I’ll take Sunstein’s word that he no longer thinks it’s funny to attack people who work for open science and say that they’re just like people who spread disinformation. I have no idea what Sunstein thinks the “grain of truth” is, but I guess that’s his problem.

Last word on this particular analogy comes from Nick Brown:

The bigger question

The bigger question is: What the hell is going on here? I assume that Sunstein doesn’t think that “good people doing good and important work” would be Stasi in another life. Also, I don’t know who are “the replication police.” After all, it’s Cass Sunstein and Brian Wansink, not Nick Brown, Anna Dreber, Uri Simonson, etc., who’ve been appointed to policymaking positions within the U.S. government.

What this looks like to me is a sort of alliance of celebrities. The so-called “replication police” aren’t police at all—unlike the Stasi, they have no legal authority or military power. Perhaps even more relevant, the replication movement is all about openness, whereas the defenders of shaky science are often shifty about their data, their analyses, and their review processes. If you want a better political analogy, how about this:

The open-science movement is like the free press. It’s not perfect, but when it works it can be one of the few checks against powerful people and institutions.

I couldn’t fit in Stasi or terrorists here, but that’s part of the point: Brown, Dreber, Simonsohn, etc., are not violent terrorists, and they’re not spreading disinformation. Rather, they’re telling, and disseminating truths that are unpleasant to some well-connected people.

Following the above-linked thread led me to this excerpt that Darren Dahly noticed from Sunstein’s book Nudge:

Jeez. Citing Wansink . . . ok, sure, back in the day, nobody knew that those publications were so flawed. But to describe Wansink’s experiments as “masterpieces” . . . what’s with that? I guess I understand, kind of. It’s the fellowship of the celebrities. Academic bestselling authors gotta stick together, right?

Several problems with science reporting, all in one place

I’d like to focus on one particular passage from Sunstein’s reporting on Wansink:

Wansink asked the recipients of the big bucket whether they might have eaten more because of the size of their bucket. Most denied the possibility, saying, “Things like that don’t trick me.” But they were wrong.

This quote illustrates several problems with science reporting:

1. Personalization; scientist-as-hero. It’s all Wansink, Wansink, Wansink. As if he did the whole study himself. As we now know, Wansink was the publicity man, not the detail man. I don’t know if these studies had anyone attending to detail, at least when it came to data collection and analysis. But, again, the larger point is that the scientist-as-hero narrative has problems.

2. Neglect of variation. Even if the study were reported and analyzed correctly, it could still be that the subset of people who said they were not influenced by the size of the bucket were not influenced. You can’t know, based on the data collected in this between-person study. We’ve discussed this general point before: it’s a statistical error to assume that an average pattern applies to everyone, or even to most people.

3. The claim that people are easily fooled. Gerd Gigerenzer has written about this a lot: There’s a lot of work being done by psychologists, economists, etc., sending the message that people are stupid and easily led astray by irrelevant stimuli. The implication is that democratic theory is wrong, that votes are determined by shark attacks, college football games, and menstrual cycles, so maybe we, the voters, can’t be reasoned with directly, we just have to be . . . nudged.

It’s frustrating to me how a commentator such as Sunstein is so ready to believe that participants in that popcorn experiments were “wrong” and then at the same time so quick to attack advocates for open science. If the open science movement had been around fifteen years ago, maybe Sunstein and lots of others wouldn’t have been conned. Not being conned is a good thing, no?

P.S. I checked Sunstein’s twitter feed to see if there was more on this Stasi thing. I couldn’t find anything, but I did notice this link to a news article he wrote, evaluating the president’s performance based on the stock market (“In terms of the Dow, 2018 was also pretty awful, with a 5.6 percent decline — the worst since 2008.”) Is that for real??

P.P.S. Look. We all make mistakes. I’m sure Sunstein is well-intentioned, just as I’m sure that the people who call us “terrorists” etc. are well-intentioned, etc. It’s just . . . openness is a good thing! To look at people who work for openness and analogize them to spies whose entire existence is based on secrecy and lies . . . that’s really some screwed-up thinking. When you’re turned around that far, it’s time to reassess, not just issue semi-apologies indicating that you think there’s a “grain of truth” to your attack. We’re all on the same side here, right?

P.P.P.S. Let me further clarify.

Bringing up Sunstein’s 2008 endorsement of Wansink is not a “gotcha.”

Back then, I probably believed all those sorts of claims too. As I’ve written in great detail, the past decade has seen a general rise in sophistication regarding published social science research, and there’s lots of stuff I believed back then, that I wouldn’t trust anymore. Sunstein fell for the hot hand fallacy fallacy too, but then again so did I!

Here’s the point. From one standpoint, Brian Wansink and Cass Sunstein are similar: They’re both well-funded, NPR-beloved Ivy League professors who’ve written best-selling books. They go on TV. They influence government policy. They’re public intellectuals!

But from another perspective, Wansink and Sunstein are completely different. Sunstein cares about evidence, Wansink shows no evidence of caring about evidence. When Sunstein learns he made a mistake, he corrects it. When Wansink learns he made a mistake, he muddies the waters.

I think the differences between Sunstein and Wansink are more important than the similarities. I wish Sunstein would see this too. I wish he’d see that the scientists and journalists who want to open things up, to share data, to reveal their own mistakes as well as those of others, are on his side. And the sloppy researchers, those who resist open data, open methods, and open discussion, are not.

To put it another way: I’m disturbed that an influential figure such as Sunstein thinks that the junk science produced Brian Wansink and other purveyors of unreplicable research are “masterpieces,” while he thinks it’s “funny” with “a grain of truth” to label careful, thoughtful analysts such as Brown, Dreber, Simonson as “Stasi.” Dude’s picking the wrong side on this one.

Continue Reading…


Read More

EARL London – agenda highlights

[This article was first published on RBlog – Mango Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are so many wonderful EARL talks happening this year – it’s hard to highlight them all! But we thought we’d share some that the Mango team are really looking forward to:

Ana Henriques, PartnerRe

Using R in Production at PartnerRe

Ana Henriques is the Analytics Tool Lead in PartnerRe’s Life & Health Department. Ana is now focused on business-side delivery of platforms and tools to support data science and related functions. Her talk will focus on the open source infrastructure supporting this process: version control, continuous integration, containerisation and container deployment and orchestration.

Kevin Kuo, RStudio

Towards open collaboration in insurance analytics

Kevin is a software engineer at RStudio and is the founder of Kasa AI, a community organization for open research in insurance analytics. Kevin will be introducing Kasa AI, a not-for-profit community initiative for open research and software development for insurance analytics. Inspired by rOpenSci and Bioconductor, his team hopes to bring together the insurance community to solve the most impactful problems.

Charlotte Wise, Essense

Beyond the average: a bayesian approach for setting media targets

Charlotte manages a small team of analysts at Essence, a global media agency and part of GroupM,
WPP. Her talk will cover how the team at Essense overcame the issue of reporting ROI on marketing campaigns by using a hierarchical bayesian model.

Kasia Kulma, Mango Solutions

Integrating empathy in the Data Science process

Kasia Kulma is a Data Scientist at Mango Solutions and holds a PhD in evolutionary biology from Uppsala University. Kasia’s talk will demonstrate how empathy has a clearly defined role at every step of the Data Science process: from pitching project ideas and gathering requirements, to implementing solutions, informing and influencing stakeholders, and gauging the impact of the product.

Mitchell Stirling, Heathrow Airport

Understanding Airport Baggage Demand through R modelling 

Mitchell is a Senior Analyst at Heathrow Airport with seven years experience working in Operations, Commercial and Strategic positions. Heathrow Airport is entering a new phase of growth and the team there wanted to look at potential scenarios for occupancy and use of infrastructure to maximise existing assets and reduce the need for expensive capital works, early in the programme. To explore how these scenarios would impact the demand on baggage systems, Heathrow has worked with Mango to convert a legacy PERL script into an R package and make a number of improvements that cut down manual intervention, flag errors earlier, stabalise the process and allow for greater variation in key inputs.

There are plenty more speakers on the agenda for you to take a look at so why not join us in September for 3 days of R, learning, inspiration and fun!

Tickets available now.



To leave a comment for the author, please follow the link and comment on their blog: RBlog – Mango Solutions. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Introducing the Plato Research Dialogue System: Building Conversational Applications at Uber’s Scale

While the process of building simple, domain-specific chatbots has gotten way easier, building large scale, multi-agent conversational applications remains a massive challenge. Recently, the Uber engineering team open sourced the Plato Research Dialogue System, which is the framework powering conversational agents across Uber’s different applications.

Continue Reading…


Read More

Labeling, transforming, and structuring training data sets for machine learning

The O’Reilly Data Show Podcast: Alex Ratner on how to build and manage training data with Snorkel.

In this episode of the Data Show, I speak with Alex Ratner, project lead for Stanford’s Snorkel open source project; Ratner also recently garnered a faculty position at the University of Washington and is currently working on a company supporting and extending the Snorkel project. Snorkel is a framework for building and managing training data. Based on our survey from earlier this year, labeled data remains a key bottleneck for organizations building machine learning applications and services.

Ratner was a guest on the podcast a little over two years ago when Snorkel was a relatively new project. Since then, Snorkel has added more features, expanded into computer vision use cases, and now boasts many users, including Google, Intel, IBM, and other organizations. Along with his thesis advisor professor Chris Ré of Stanford, Ratner and his collaborators have long championed the importance of building tools aimed squarely at helping teams build and manage training data. With today’s release of Snorkel version 0.9, we are a step closer to having a framework that enables the programmatic creation of training data sets.

Continue reading Labeling, transforming, and structuring training data sets for machine learning.

Continue Reading…


Read More

Four short links: 15 August 2019

Data Businesses, Data Science Class, Tiny Mouse, and Training Bias

  1. Making Uncommon Knowledge Common -- The Rich Barton playbook is building data content loops to disintermediate incumbents and dominate search, and then using this traction to own demand in their industries.
  2. Data: Past, Present, and Future -- Data and data-empowered algorithms now shape our professional, personal, and political realities. This course introduces students both to critical thinking and practice in understanding how we got here, and the future we now are building together as scholars, scientists, and citizens. The way "Intro to Data Science" classes ought to be.
  3. Clever Travel Mouse -- very small presenter tool, mouse and pointer.
  4. Training Bias in "Hate Speech Detector" Means Black Speech is More Likely to be Censored (BoingBoing) -- The authors do a pretty good job of pinpointing the cause: the people who hand-labeled the training data for the algorithm were themselves biased, and incorrectly, systematically misidentified AAE writing as offensive. And since machine learning models are no better than their training data (though they are often worse!), the bias in the data propagated through the model.

Continue reading Four short links: 15 August 2019.

Continue Reading…


Read More

Predicting whether you are Democrat or Republican

The New York Times is in a quizzy mood lately. Must be all the hot weather. Sahil Chinoy shows how certain demographics tend towards Democrat or Republican, with a hook that that lets you put in your own information. A decision tree updates as you go.

Reminds of the Amanda Cox decision tree classic from 2008.

Tags: , , , , ,

Continue Reading…


Read More

The Layman’s Guide to Banking as a Service

Banking as a Service (BaaS) is the democratisation of financial capabilities that have fiercely been protected, isolated and hidden in silos for hundreds of years by banks. The fact that BaaS opens up banks’ capabilities and essentially empowers anyone to be able to create their own financial products, goes against

The post The Layman’s Guide to Banking as a Service appeared first on Dataconomy.

Continue Reading…


Read More

If you did not already know

Limited Gradient Descent google
Label noise may handicap the generalization of classifiers, and it is an important issue how to effectively learn main pattern from samples with noisy labels. Recent studies have witnessed that deep neural networks tend to prioritize learning simple patterns and then memorize noise patterns. This suggests a method to search the best generalization, which learns the main pattern until the noise begins to be memorized. A natural idea is to use a supervised approach to find the stop timing of learning, for example resorting clean verification set. In practice, however, a clean verification set is sometimes not easy to obtain. To solve this problem, we propose an unsupervised method called limited gradient descent to estimate the best stop timing. We modified the labels of few samples in noisy dataset to be almost false labels as reverse pattern. By monitoring the learning progresses of the noisy samples and the reverse samples, we can determine the stop timing of learning. In this paper, we also provide some sufficient conditions on learning with noisy labels. Experimental results on CIFAR-10 demonstrate that our approach has similar generalization performance to those supervised methods. For uncomplicated datasets, such as MNIST, we add relabeling strategy to further improve generalization and achieve state-of-the-art performance. …

Focused Attention Network google
Attention networks show promise for both vision and language tasks, by emphasizing relationships between constituent elements through appropriate weighting functions. Such elements could be regions in an image output by a region proposal network, or words in a sentence, represented by word embedding. Thus far, however, the learning of attention weights has been driven solely by the minimization of task specific loss functions. We here introduce a method of learning attention weights to better emphasize informative pair-wise relations between entities. The key idea is to use a novel center-mass cross entropy loss, which can be applied in conjunction with the task specific ones. We then introduce a focused attention backbone to learn these attention weights for general tasks. We demonstrate that the focused attention module leads to a new state-of-the-art for the recovery of relations in a relationship proposal task. Our experiments show that it also boosts performance for diverse vision and language tasks, including object detection, scene categorization and document classification. …

Dual User and Product Memory Network (DUPMN) google
In sentiment analysis (SA) of product reviews, both user and product information are proven to be useful. Current tasks handle user profile and product information in a unified model which may not be able to learn salient features of users and products effectively. In this work, we propose a dual user and product memory network (DUPMN) model to learn user profiles and product reviews using separate memory networks. Then, the two representations are used jointly for sentiment prediction. The use of separate models aims to capture user profiles and product information more effectively. Compared to state-of-the-art unified prediction models, the evaluations on three benchmark datasets, IMDB, Yelp13, and Yelp14, show that our dual learning model gives performance gain of 0.6%, 1.2%, and 0.9%, respectively. The improvements are also deemed very significant measured by p-values. …

BM-GAN google
Machine learning (ML) has progressed rapidly during the past decade and the major factor that drives such development is the unprecedented large-scale data. As data generation is a continuous process, this leads to ML service providers updating their models frequently with newly-collected data in an online learning scenario. In consequence, if an ML model is queried with the same set of data samples at two different points in time, it will provide different results. In this paper, we investigate whether the change in the output of a black-box ML model before and after being updated can leak information of the dataset used to perform the update. This constitutes a new attack surface against black-box ML models and such information leakage severely damages the intellectual property and data privacy of the ML model owner/provider. In contrast to membership inference attacks, we use an encoder-decoder formulation that allows inferring diverse information ranging from detailed characteristics to full reconstruction of the dataset. Our new attacks are facilitated by state-of-the-art deep learning techniques. In particular, we propose a hybrid generative model (BM-GAN) that is based on generative adversarial networks (GANs) but includes a reconstructive loss that allows generating accurate samples. Our experiments show effective prediction of dataset characteristics and even full reconstruction in challenging conditions. …

Continue Reading…


Read More

Hardware realization of a CS-based MIMO radar

** Nuit Blanche is now on Twitter: @NuitBlog **

Kumar just sent me the following the other day:

Hi Igor,
We recently published our work on the hardware realization of a CS-based MIMO radar in IEEE Transactions on Aerospace and Electronic Systems. Your readers might be interested in this.
Kumar Vijay Mishra
Thanks Kumar  !

Here is the abstract:

We present a cognitive prototype that demonstrates a colocated, frequency-division-multiplexed, multiple-input multiple-output (MIMO) radar which implements both temporal and spatial sub-Nyquist sampling. The signal is sampled and recovered via the Xampling framework. Cognition is due to the fact that the transmitter adapts its signal spectrum by emitting only those subbands that the receiver samples and processes. Real-time experiments demonstrate sub-Nyquist MIMO recovery of target scenes with 87:5% spatio-temporal bandwidth reduction and signal-to-noise-ratio of -10 dB.

Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Continue Reading…


Read More

Big Data: Wrangling 4.6M Rows with dtplyr (the NEW data.table backend for dplyr)

[This article was first published on, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Wrangling Big Data is one of the best features of the R programming language, which boasts a Big Data Ecosystem that contains fast in-memory tools (e.g. data.table) and distributed computational tools (sparklyr). With the NEW dtplyr package, data scientists with dplyr experience gain the benefits of data.table backend. We saw a 3X speed boost for dplyr!

We’ll go over the pros and cons and what you need to know to get up and running using a real world example of Fannie Mae Loan Performance that when combined is 4.6M Rows by 55 Columns – Not super huge, but enough to show off the new and improved dtplyr interface to the data.table package. We’ll end with a Time Study showing a 3X Speed Boost and Learning Recommendations to get you expertise fast.

Like this article? Here are more just like it!

If you like this article, we have more just like it in our Machine Learning Section of the Business Science Learning Hub.

Table of Contents

  1. The 30-Second Summary

  2. Big Data Ecosystem

  3. Enter dtplyr: Boost dplyr with data.table backend

  4. Case Study – Wrangling 4.6M Rows of Financial Data

  5. The 3X Speedup – Time Comparisons

  6. Conclusions and Learning Recommendations

  7. Additional Big Data Guidelines and Packages

  8. Recognizing the Developers

  9. Expert Shiny Apps Course – Coming Soon!

1.0 The 30-Second Summary

We reviewed the latest advance in big data – The NEW dtplyr package, which is an interface to the high performance data.table library.


  • A 3X speed boost on the data joining and wrangling operations on a 4.6M ROw data set. The data wrangling operatiosn were performed in 6 seconds with dtplyr vs 18 seconds with dplyr.

  • Performs inplace operations (:=), which vastly accelerates big data computations (see grouped time series lead() operation in Section 3.7 tutorial)

  • Shows the data.table translation (this is really cool!)


  • For pure speed, you will need to learn all of data.table’s features including managing keys for fast lookups.

  • In most cases, data.table will be faster than dtplyr because of overhead in the dtplyr translation process. However, we saw the difference to be very minimal.

  • dtplyr is in experimental status currently – Tester’s wanted, file issues and requests here

What Should You Learn?

Just starting out? Our recommendation is to learn dplyr first, then learn data.table, using dtplyr to bridge the gap

  • Begin with dplyr, which has easy-to-learn syntax and works well for datasets of 1M Rows+.

  • Learn data.table as you become comfortable in R. data.table is great for pure speed on data sets 50M Rows+. It has a different “bracketed” syntax that is streamlined but more complex for beginners. However, it has features like fast keyed subsetting and optimization for rolling joins that are out of the scope of this article.

  • Use dtplyr as a translation tool to help bridge the gap between dplyr and data.table.

At a bare minimum – Learning dplyr is essential. Learn more about a system for learning dplyr in the Conclusions and Recommendations.

2.0 Big Data Ecosystem

R has an amazing ecosystem of tools designed for wrangling Big Data. The 3 most popular tools are dplyr, data.table, and sparklyr. We’ve trained hundreds of students on big data, and our students most common Big Data question is, “Which tool to use and when?”

Big Data Tools by Dataset Size

Big Data: Data Wrangling Tools By Dataset Size

Source: Business Science Learning Lab 13: Wrangling 4.6M Rows (375 MB) of Financial Data with data.table

The “Big Data: Data Wrangling Tools by Dataset Size” graphic comes from Business Science’s Learning Lab 13: Wrangling 4.6M Rows (375 MB) of Financial Data with data.table where we taught students how to use data.table using Fannie Mae’s Financial Data Set. The graphic provides rough guidelines on when to use which tools by dataset row size.

  1. dplyr (website) – Used for in-memory calculations. Syntax design and execution emphasizes readability over performance. Very good in most situations.

  2. data.table (website) – Used for higher in-memory performance. Modifies data inplace for huge speed gains. Easily wrangles data in the range of 10M-50M+ rows.

  3. sparklyr (website) – Distribute work across nodes (clusters) and performs work in parallel. Best used on big data (100M+ Rows).

3.0 Enter dtplyr: Boost dplyr with data.table backend

We now have a 4th tool that boosts dplyr using data.table as its backend. The good news is that if you are already familiar with dplyr, you don’t need to learn much to get the gains of data.table!

dtplyr: Bridging the Big Data Gap

dtplyr: Bridging the Big Data Gap

The dtplyr package is a new front-end that wraps the High Performance data.table R package. I say new, but dtplyr has actually been around for over 2 years. However, the implementation recently underwent a complete overhaul vastly improving the functionality. Let’s check out the goals the package from the dtplyr website:

dtplyr for Big Data

dtplyr for Big Data

Here’s what you need to know:

  • Goal: Increase speed of working with big data when using dplyr syntax

  • Implementation: The dtplyr package enables the user to write dplyr code. Internally the package translates the code to data.table syntax. When run, the user gains the faster performance of data.table while being able to write the more readable dplyr code.

  • Dev Status: The package is still experimental. This means that developers are still in the process of testing the package out, reporting bugs, and improving via feature requests.

4.0 Case Study – Wrangling 4.6M Rows (375MB) of Financial Data

Let’s try out the new and improved dtplyr + data.table combination on a large-ish data set.

4.1 Bad Loans Cost Millions (and Data Sets are MASSIVE)

Bank Loan Defaults

Loan defaults cost organization millions. Further, the datasets are massive. This is a task where data.table and dtplyr will be needed as part of the preprocessing steps prior to building a Machine Learning Model.

4.2 Fannie Mae Data Set

The data used in the tutorial can be downloaded from Fannie Mae’s website. We will just be using the 2018 Q1 Acquisition and Performance data set.

Fannie Mae Loan Data

A few quick points:

  • The 2018 Q1 Performance Data Set we will use is 4.6M rows, enough to send Excel to a grinding hault, crashing your computer in the process.

  • For dplyr, it’s actually do-able at 4.6M rows. However, if we were to do the full 25GB, we’d definitely want to use data.table to speed things up.

  • We’ll do a series of common data manipulation operations including joins and grouped time series calculation to determine which loans become delinquent in the next 3 months.

4.3 Install and Load Libraries

In this tutorial, we’ll use the latest Development Version of dtplyr installed using devtools. All other packages used can be used by installing with install.packages().


Next, we’ll load the the following libraries with library():

  • data.table: High-performance data wrangling
  • dtplyr: Interface between dplyr and data.table
  • tidyverse: Loads dplyr and several other useful R packages
  • vroom: Fast reading of delimited files (e.g. csv) with vroom()
  • tictoc: Simple timing operations
  • knitr: Use the kable() function for nice HTML tables
# Load data.table & dtplyr interface

# Core Tidyverse - Loads dplyr

# Fast reading of delimited files (e.g. csv)
library(vroom) # vroom()

# Timing operations

# Table Printing
library(knitr) # kable()

4.4 Read the Data

We’ll read the data. The column-types are going to be pre-specified to assist in the loading process. The vroom() function does the heavy lifting.

First, I’ll setup the paths to the two files I’ll be reading:

  1. Acquisitions_2018Q1.txt – Meta-data about each loan
  2. Performance_2018Q1.txt – Time series data set with loan performance characteristics over time

For me, the files are stored in a folder called 2019-08-15-dtplyr. Your paths may be different depending on where the files are stored.

# Paths to files
path_acq  <- "2019-08-15-dtplyr/Acquisition_2018Q1.txt"
path_perf <- "2019-08-15-dtplyr/Performance_2018Q1.txt"

Read the Loan Acquisition Data

Note we specify the columns and types to improve the speed of reading the columns.

# Loan Acquisitions Data 
col_types_acq <- list(
    loan_id                            = col_factor(),
    original_channel                   = col_factor(NULL),
    seller_name                        = col_factor(NULL),
    original_interest_rate             = col_double(),
    original_upb                       = col_integer(),
    original_loan_term                 = col_integer(),
    original_date                      = col_date("%m/%Y"),
    first_pay_date                     = col_date("%m/%Y"),
    original_ltv                       = col_double(),
    original_cltv                      = col_double(),
    number_of_borrowers                = col_double(),
    original_dti                       = col_double(),
    original_borrower_credit_score     = col_double(),
    first_time_home_buyer              = col_factor(NULL),
    loan_purpose                       = col_factor(NULL),
    property_type                      = col_factor(NULL),
    number_of_units                    = col_integer(),
    occupancy_status                   = col_factor(NULL),
    property_state                     = col_factor(NULL),
    zip                                = col_integer(),
    primary_mortgage_insurance_percent = col_double(),
    product_type                       = col_factor(NULL),
    original_coborrower_credit_score   = col_double(),
    mortgage_insurance_type            = col_double(),
    relocation_mortgage_indicator      = col_factor(NULL))

acquisition_data <- vroom(
      file       = path_acq, 
      delim      = "|", 
      col_names  = names(col_types_acq),
      col_types  = col_types_acq,
      na         = c("", "NA", "NULL"))

The loan acquisition data contains information about the owner of the loan.

acquisition_data %>% head() %>% kable()
loan_id original_channel seller_name original_interest_rate original_upb original_loan_term original_date first_pay_date original_ltv original_cltv number_of_borrowers original_dti original_borrower_credit_score first_time_home_buyer loan_purpose property_type number_of_units occupancy_status property_state zip primary_mortgage_insurance_percent product_type original_coborrower_credit_score mortgage_insurance_type relocation_mortgage_indicator
100001040173 R QUICKEN LOANS INC. 4.250 453000 360 2018-01-01 2018-03-01 65 65 1 28 791 N C PU 1 P OH 430 NA FRM NA NA N
100002370993 C WELLS FARGO BANK, N.A. 4.250 266000 360 2018-01-01 2018-03-01 80 80 2 41 736 N R PU 1 P IN 467 NA FRM 793 NA N
100005405807 R PMTT4 3.990 233000 360 2017-12-01 2018-01-01 79 79 2 48 696 N R SF 1 P CA 936 NA FRM 665 NA N
100008071646 R OTHER 4.250 184000 360 2018-01-01 2018-03-01 80 80 1 48 767 Y P PU 1 P FL 336 NA FRM NA NA N
100010739040 R OTHER 4.250 242000 360 2018-02-01 2018-04-01 49 49 1 22 727 N R SF 1 P CA 906 NA FRM NA NA N
100012691523 R OTHER 5.375 180000 360 2018-01-01 2018-03-01 80 80 1 14 690 N C PU 1 P OK 730 NA FRM NA NA N

Get the size of the acquisitions data set: 426K rows by 25 columns. Not that bad, but this is meta-data for the loan. The dataset we are worried about is the next one.

## [1] 426207     25

Read the Loan Performance Data

# Loan Performance Data 
col_types_perf = list(
    loan_id                                = col_factor(),
    monthly_reporting_period               = col_date("%m/%d/%Y"),
    servicer_name                          = col_factor(NULL),
    current_interest_rate                  = col_double(),
    current_upb                            = col_double(),
    loan_age                               = col_double(),
    remaining_months_to_legal_maturity     = col_double(),
    adj_remaining_months_to_maturity       = col_double(),
    maturity_date                          = col_date("%m/%Y"),
    msa                                    = col_double(),
    current_loan_delinquency_status        = col_double(),
    modification_flag                      = col_factor(NULL),
    zero_balance_code                      = col_factor(NULL),
    zero_balance_effective_date            = col_date("%m/%Y"),
    last_paid_installment_date             = col_date("%m/%d/%Y"),
    foreclosed_after                       = col_date("%m/%d/%Y"),
    disposition_date                       = col_date("%m/%d/%Y"),
    foreclosure_costs                      = col_double(),
    prop_preservation_and_repair_costs     = col_double(),
    asset_recovery_costs                   = col_double(),
    misc_holding_expenses                  = col_double(),
    holding_taxes                          = col_double(),
    net_sale_proceeds                      = col_double(),
    credit_enhancement_proceeds            = col_double(),
    repurchase_make_whole_proceeds         = col_double(),
    other_foreclosure_proceeds             = col_double(),
    non_interest_bearing_upb               = col_double(),
    principal_forgiveness_upb              = col_double(),
    repurchase_make_whole_proceeds_flag    = col_factor(NULL),
    foreclosure_principal_write_off_amount = col_double(),
    servicing_activity_indicator           = col_factor(NULL))

performance_data <- vroom(
    file       = path_perf, 
    delim      = "|", 
    col_names  = names(col_types_perf),
    col_types  = col_types_perf,
    na         = c("", "NA", "NULL"))

Let’s inspect the data. We can see that this is a time series where each “Loan ID” and “Monthly Reporting Period” go together.

performance_data %>% head() %>% kable()
loan_id monthly_reporting_period servicer_name current_interest_rate current_upb loan_age remaining_months_to_legal_maturity adj_remaining_months_to_maturity maturity_date msa current_loan_delinquency_status modification_flag zero_balance_code zero_balance_effective_date last_paid_installment_date foreclosed_after disposition_date foreclosure_costs prop_preservation_and_repair_costs asset_recovery_costs misc_holding_expenses holding_taxes net_sale_proceeds credit_enhancement_proceeds repurchase_make_whole_proceeds other_foreclosure_proceeds non_interest_bearing_upb principal_forgiveness_upb repurchase_make_whole_proceeds_flag foreclosure_principal_write_off_amount servicing_activity_indicator
100001040173 2018-02-01 QUICKEN LOANS INC. 4.25 NA 0 360 360 2048-02-01 18140 0 N   NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA   NA N
100001040173 2018-03-01   4.25 NA 1 359 359 2048-02-01 18140 0 N   NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA   NA N
100001040173 2018-04-01   4.25 NA 2 358 358 2048-02-01 18140 0 N   NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA   NA N
100001040173 2018-05-01   4.25 NA 3 357 357 2048-02-01 18140 0 N   NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA   NA N
100001040173 2018-06-01   4.25 NA 4 356 356 2048-02-01 18140 0 N   NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA   NA N
100001040173 2018-07-01   4.25 NA 5 355 355 2048-02-01 18140 0 N   NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA   NA N

Let’s check out the data size. We can see it’s 4.6M rows by 31 columns! Just a typical financial time series (seriously).

## [1] 4645448      31

4.5 Convert to Tibbles to dtplyr Steps

Next, we’ll use the lazy_dt() function to convert the tibbles to dtplyr steps.

acquisition_data_dtplyr <- lazy_dt(acquisition_data)
performance_data_dtplyr <- lazy_dt(performance_data)

We can check the class() to see what we are working with.

## [1] "dtplyr_step_first" "dtplyr_step"

The returned object is the first step in a dtplyr sequence.

Key Point:

  • We are going to set up operations using a sequence of steps.
  • The operations will not be fully evaluated until we convert to a data.table or tibble depending on our desired output.

4.6 Join the Data Sets

Our first data manipulation operation is a join. We are going to use the left_join() function from dplyr. Let’s see what happens.

combined_dtplyr <- performance_data_dtplyr %>%

The output of the joining operation is a new step sequence, this time a dtplyr_step_subset.

## [1] "dtplyr_step_subset" "dtplyr_step"

Next, let’s examine what happens when we print combined_dt to the console.

## Source: local data table [?? x 55]
## Call:   `_DT2`[`_DT1`, on = .(loan_id)]
##   loan_id monthly_reporti… servicer_name current_interes… current_upb
## 1 100001… 2018-02-01       QUICKEN LOAN…             4.25          NA
## 2 100001… 2018-03-01       ""                        4.25          NA
## 3 100001… 2018-04-01       ""                        4.25          NA
## 4 100001… 2018-05-01       ""                        4.25          NA
## 5 100001… 2018-06-01       ""                        4.25          NA
## 6 100001… 2018-07-01       ""                        4.25          NA
## # … with 50 more variables: loan_age ,
## #   remaining_months_to_legal_maturity ,
## #   adj_remaining_months_to_maturity , maturity_date ,
## #   msa , current_loan_delinquency_status ,
## #   modification_flag , zero_balance_code ,
## #   zero_balance_effective_date ,
## #   last_paid_installment_date , foreclosed_after ,
## #   disposition_date , foreclosure_costs ,
## #   prop_preservation_and_repair_costs ,
## #   asset_recovery_costs , misc_holding_expenses ,
## #   holding_taxes , net_sale_proceeds ,
## #   credit_enhancement_proceeds ,
## #   repurchase_make_whole_proceeds ,
## #   other_foreclosure_proceeds , non_interest_bearing_upb ,
## #   principal_forgiveness_upb ,
## #   repurchase_make_whole_proceeds_flag ,
## #   foreclosure_principal_write_off_amount ,
## #   servicing_activity_indicator , original_channel ,
## #   seller_name , original_interest_rate ,
## #   original_upb , original_loan_term ,
## #   original_date , first_pay_date , original_ltv ,
## #   original_cltv , number_of_borrowers ,
## #   original_dti , original_borrower_credit_score ,
## #   first_time_home_buyer , loan_purpose ,
## #   property_type , number_of_units ,
## #   occupancy_status , property_state , zip ,
## #   primary_mortgage_insurance_percent , product_type ,
## #   original_coborrower_credit_score ,
## #   mortgage_insurance_type ,
## #   relocation_mortgage_indicator 
## # Use to access results

Key Points:

  • The important piece is the data.table translation code, which we can see in the ouput: Call: _DT2[_DT1, on = .(loan_id)]

  • Note that we haven’t excecuted the data manipulation operation. dtplyr smartly gives us a glimpse of what the operation will look like though, which is really cool.

4.7 Wrangle the Data

We’ll do a sequence of data wrangling operations:

  • Select specific columns we want to keep
  • Arrange by loan_id and monthly_reporting_period. This is needed to keep groups together and in the right time-stamp order.
  • Group by loan_id and mutate to calculate whether or not loans become delinquent in the next 3 months.
  • Filter rows with NA values from the newly created column (these aren’t needed)
  • Reorder the columns to put the new calculated column first.
final_output_dtplyr <- combined_dtplyr %>%
    select(loan_id, monthly_reporting_period, 
           current_loan_delinquency_status) %>%
    arrange(loan_id, monthly_reporting_period) %>%
    group_by(loan_id) %>%
    mutate(gt_1mo_behind_in_3mo = lead(current_loan_delinquency_status, n = 3) >= 1) %>%
    ungroup() %>%
    filter(! %>%
    select(gt_1mo_behind_in_3mo, everything())

The final output is a dtplyr_step_group, which is just a sequence of steps.

## [1] "dtplyr_step_group" "dtplyr_step"

If we print the final_output_dt object, we can see the data.table translation is pretty intense.

## Source: local data table [?? x 4]
## Call:   `_DT2`[`_DT1`, .(loan_id, monthly_reporting_period, current_loan_delinquency_status), 
##     on = .(loan_id)][order(loan_id, monthly_reporting_period)][, 
##     `:=`(gt_1mo_behind_in_3mo = lead(current_loan_delinquency_status, 
##         n = 3) >= 1), keyby = .(loan_id)][!, 
##     .(gt_1mo_behind_in_3mo, loan_id, monthly_reporting_period, 
##         current_loan_delinquency_status)]
##   gt_1mo_behind_in_… loan_id   monthly_reporting… current_loan_delinq…
## 1 FALSE              10000104… 2018-02-01                            0
## 2 FALSE              10000104… 2018-03-01                            0
## 3 FALSE              10000104… 2018-04-01                            0
## 4 FALSE              10000104… 2018-05-01                            0
## 5 FALSE              10000104… 2018-06-01                            0
## 6 FALSE              10000104… 2018-07-01                            0
## # Use to access results

Key Point:

  • The most important piece is that dtplyr correctly converted the grouped mutation to an inplace calculation, which is data.table speak for a super-fast calculation that makes no copies of the data. Here’s inplace calculation code from the dtplyr translation: [, :=(gt_1mo_behind_in_3mo = lead(current_loan_delinquency_status, n = 3) >= 1), keyby = .(loan_id)]

4.8 Collecting The Data

Note that up until now, nothing has been done to process the data – we’ve just created a recipe for data wrangling. We still need tell dtplyr to execute the data wrangling operations.

To implement all of the steps and convert the dtplyr sequence to a tibble, we just call as_tibble().

final_output_dtplyr %>% as_tibble()
## # A tibble: 3,352,231 x 4
##    gt_1mo_behind_in_… loan_id  monthly_reporting… current_loan_delinq…
##  1 FALSE              1000010… 2018-02-01                            0
##  2 FALSE              1000010… 2018-03-01                            0
##  3 FALSE              1000010… 2018-04-01                            0
##  4 FALSE              1000010… 2018-05-01                            0
##  5 FALSE              1000010… 2018-06-01                            0
##  6 FALSE              1000010… 2018-07-01                            0
##  7 FALSE              1000010… 2018-08-01                            0
##  8 FALSE              1000010… 2018-09-01                            0
##  9 FALSE              1000023… 2018-03-01                            0
## 10 FALSE              1000023… 2018-04-01                            0
## # … with 3,352,221 more rows

Key Point:

  • Calling the as_tibble() function tells dtplyr to execute the data.table wrangling operations.

5.0 The 3X Speedup – Time Comparisons

Finally, let’s check the performance of the dplyr vs dtplyr vs data.table. We can seed a nice 3X speed boost!

5.1 Time using dplyr

performance_data %>%
    left_join(acquisition_data) %>%
    select(loan_id, monthly_reporting_period, 
           current_loan_delinquency_status) %>%
    arrange(loan_id, monthly_reporting_period) %>%
    group_by(loan_id) %>%
    mutate(gt_1mo_behind_in_3mo = lead(current_loan_delinquency_status, n = 3) >= 1) %>%
    ungroup() %>%
    filter(! %>%
    select(gt_1mo_behind_in_3mo, everything())
## 15.905 sec elapsed

5.2 Time using dtplyr

performance_data_dtplyr %>%
    left_join(acquisition_data_dtplyr) %>%
    select(loan_id, monthly_reporting_period, 
           current_loan_delinquency_status) %>%
    arrange(loan_id, monthly_reporting_period) %>%
    group_by(loan_id) %>%
    mutate(gt_1mo_behind_in_3mo = lead(current_loan_delinquency_status, n = 3) >= 1) %>%
    ungroup() %>%
    filter(! %>%
    select(gt_1mo_behind_in_3mo, everything()) %>%
## 4.821 sec elapsed

5.3 Time using data.table

DT1 <-
DT2 <-

DT2[DT1, .(loan_id, monthly_reporting_period, current_loan_delinquency_status), on = .(loan_id)] %>%
  .[order(loan_id, monthly_reporting_period)] %>%
  .[, gt_1mo_behind_in_3mo := lead(current_loan_delinquency_status, n = 3) >= 1, keyby = .(loan_id)] %>%
  .[!, .(gt_1mo_behind_in_3mo, loan_id, monthly_reporting_period, current_loan_delinquency_status)]
## 4.627 sec elapsed

6.0 Conclusions and Learning Recommendations

For Big Data wrangling, the dtplyr package represents a huge opportunity for data scientists to leverage the speed of data.table with the readability of dplyr. We saw an impressive 3X Speedup going from dplyr to using dtplyr for wrangling a 4.6M row data set. This just scratches the surface of the potential, and I’m looking forward to seeing dtplyr mature, which will help bridge the gap between the two groups of data scientists using dplyr and data.table.

For new data scientists coming from other tools like Excel, my hope is that you see the awesome potential of learning R for data analysis and data science. The Big Data capabilities represent a massive opportunity for you to bring your organization data science at scale.

You just need to learn how to go from normal data to Big Data.

My recommendation is to start by learning dplyr – The popular data manipulation library that makes reading and writing R code very easy to understand.

Once you get to an intermediate level, learn data.table. This is where you gain the benefits of scaling data science to Big Data. The data.table package has a steeper learning curve, but learning it will help you leverage its full performance and scalability.

If you need to learn dplyr as fast as possible – I recommend beginning with our Data Science Foundations DS4B 101-R Course. The 101 Course is available as part of the 3-Course R-Track Bundle, a complete learning system designed to transform you from beginner to advanced in under 6-months. You will learn everything you need to become an expert data scientist.

7.0 Additional Big Data Guidelines

I find that students have an easier time picking a tool based on dataset row size (e.g. I have 10M rows, what should I use?). With that said, there are 2 factors that will influence whhich tools you need to use:

  1. Are you performing Grouped and Iterative Operations? Performance even on normal data sets can become an issue if you have a lot of groups or if the calculation is iterative. A particular source of pain in the financial realm are rolling (window) calculations, which are both grouped and iterative within groups. In these situation, use high-performance C++ functions (e.g. Rolling functions from the roll package or RcppRoll package).

  2. Do you have sufficient RAM? Once you begin working with gig’s of data, then you start to run out of memory (RAM). In these situations, you will need to work in chunks and parellelizing operations. You can do this with distributed sparklyr, which will perform some operations in parallel and distribute across nodes.

8.0 Recognizing the Developers

I’d like to take a quick moment to thank the developers of data.table and dplyr. Without these two packages, Business Science probably would not exist. Thank you.

9.0 Coming Soon – Expert Shiny Apps Course!

I’m very excited to announce that Business Science has an Expert Shiny Course – Coming soon! Head over to Business Science University and create a free account. I will update you with the details shortly.

To leave a comment for the author, please follow the link and comment on their blog: offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Document worth reading: “A Survey on Compressive Sensing: Classical Results and Recent Advancements”

Recovering sparse signals from linear measurements has demonstrated outstanding utility in a vast variety of real-world applications. Compressive sensing is the topic that studies the associated raised questions for the possibility of a successful recovery. This topic is well-nourished and numerous results are available in the literature. However, their dispersity makes it challenging and time-consuming for new readers and practitioners to quickly grasp its main ideas and classical algorithms, and further touch upon the recent advancements in this surging field. Besides, the sparsity notion has already demonstrated its effectiveness in many contemporary fields. Thus, these results are useful and inspiring for further investigation of related questions in these emerging fields from new perspectives. In this survey, we gather and overview vital classical tools and algorithms in compressive sensing and describe significant recent advancements. We conclude this survey by a numerical comparison of the performance of described approaches on an interesting application. A Survey on Compressive Sensing: Classical Results and Recent Advancements

Continue Reading…


Read More

Distilled News

Understand Data Normalization in Machine Learning

If you’re new to data science/machine learning, you probably wondered a lot about the nature and effect of the buzzword ‘feature normalization’. If you’ve read any Kaggle kernels, it is very likely that you found feature normalization in the data preprocessing section. So, what is data normalization and why the heck is it so valued by data practitioners?

Feature Selection Why & How Explained

In the last article, I explained the problems with including irrelevant or correlated features in model building. In this article, I’ll show you several neat implementations of selection algorithms that can be easily integrated into your project pipeline. Before diving into the detailed implementation, let’s go through the dataset I created. The dataset has 20 features, among which 5 contribute to the output and 2 are correlated.
1. Wrapper Feature Selection
2. Filtering Feature Selection
3. Embedded Feature Selection

The tools you should know for the Machine Learning projects

I would like to start my first Machine Learning project. But I do not have tools. What should I do? What are the tools I could use? I will give you some hints and advices based on the toolbox I use. Of course there are more great tools but you should pick the ones you like. You should also use the tools that make your work productive which means you need to pay for them (which is not always the case – I do use free tools as well). The first and most important thing is that there are lots of options! Just pick what works for you! I have divided this post into several parts like the environments, the langauges and the libraries.

R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms

Computer vision is an interdisciplinary field that has been gaining huge amounts of traction in the recent years(since CNN) and self-driving cars have taken centre stage. Another integral part of computer vision is object detection. Object detection aids in pose estimation, vehicle detection, surveillance etc. The difference between object detection algorithms and classification algorithms is that in detection algorithms, we try to draw a bounding box around the object of interest to locate it within the image. Also, you might not necessarily draw just one bounding box in an object detection case, there could be many bounding boxes representing different objects of interest within the image and you would not know how many beforehand.

Visual intuition on ring-Allreduce for distributed Deep Learning

Recently I found myself working on a very large data set, one of those that you need to parallelize learning to make it feasible. I immediately thought of Uber’s Horovod. I had previously heard about it from a tech talk at Uber but had not really played around with it. I found it very interesting and a great framework, from the high-level simplifications to the algorithm that powers this framework. In this post, I’ll try to describe my understanding of the latter.

How to Build an Automated Trading System using R

For all R zealots, we know that we can build any data product very efficiently using R. An automated trading system is not an exception. Whether you are doing high-frequency trading, day trading, swing trading, or even value investing, you can use R to build a trading robot that watches the market closely and trades the stocks or other financial instruments on your behalf. The benefits of a trading robot are obvious:
• A trading robot follows our pre-defined trading rules strictly. Other than human beings, the robot has no emotion involved when it makes trading decisions.
• A trading robot does not need rest (yet). The trading robot can watch the market price movement at every second across multiple financial instruments and execute the order immediately when the timing is correct.

Custom model deployment on Google A.I. Platform Serving

Recently, Google Cloud AI Platform Serving (CAIP) added a new feature which Machine Learning (ML) practitioners can now use to deploy models with customized pre-processing pipelines and prediction routines using their favorite frameworks, all under one serverless microservice. In this blog post, I explain in detail how we at Wootric make use of this feature and mention a few nitty-gritties to be careful about, as the product is still in beta.

Anomaly Detection with PyOD!

Are you an anomaly detection professional, or planning to advance modeling in anomaly detection? Then you should not miss this wonderful Python Outlier Detection (PyOD) Toolkit. It is a comprehensive module that has been featured by academic researches (see this summary) and the machine learning websites such as Towards Data Science, Analytics Vidhya, KDnuggets, etc.

How to create data-driven presentations with jupyter notebooks, reveal.js, host on github, and show it to the world: Part I — make a basic slide deck

… in which I discuss a workflow where you can start writing your contents on a jupyter notebook, create a reveal.js slide deck, and host it on github for presentations. This is for a very simple presentation that you can fully control yourself.

One Class Learning in Manufacturing: Autoencoder and Golden Units Baselining

Recently I’ve been working with manufacturing customers (both OEM and CM) who want to jump on the bandwagon of machine learning. One common use case is to better detect products (or Device Under Test/DUT) that are defective in their production line. Using machine learning’s terminology, this falls under the problem of binary classification as a DUT can only pass or fail.

How to Get Started as a Developer in AI

Artificial Intelligence. Well, it looks like this cutting-edge technology is now the most popular and at the same time the most decisive one for humanity. We are ceaselessly amazed at the AI capabilities and the effective way they can be used in almost any industry. Robots now is just like the airplane 100 years ago. So what’s next? This question raises many emotions starting from great interest, encouragement, the desire to be part of this process, and ending with the fear, complete confusion and ignorance. But what’s stopping you from sitting in one of the front seats of AI development and just don’t be a passive observer? You may assume getting started as a developer in AI is a long and hard path. Well, yes, but it doesn’t mean you can’t handle it. Let me say one word for those who doubt. Even if you don’t have any prior experience in programming, math, engineering, you can learn AI from scratch sitting at home and start applying your knowledge in practice, creating simple machine learning solutions and making first steps towards your new profession.
Part I. First Off, Gain Basic Skills Required to Start Learning AI
Part II. Start Learning AI – the Most Important Part
Part III. Practice your skills

How Neural Networks Are Learning to Write

An overview of the evolution of NLP models for writing text.
• Markov Chains and N-grams
• Word Embeddings and Neural Language Models
• Recurrent Neural Networks
• Transformers

Understanding Adam : how loss functions are minimized ?

While using the library built on top of PyTorch, I realized that I have never had to interact with an optimizer so far. Since already deals with it when calling the fit_one_cycle method, I don’t have to parametrize the optimizer nor do I understand how it works. Adam is probably the most used optimizer in machine learning due to its simplicity and speed. It was developed in 2015 by Diederik Kingma and Jimmy Lei Ba and introduced in a paper called Adam : a method for stochastic optimization. As always, this blog post is a cheat sheet that I write to check my understanding of a notion. If you find something unclear or incorrect, don’t hesitate to write it in the comment section.

Text Generation with LSTM: Economic Analysis

Could you imagine a future where computers made economic decisions rather than governments and central bankers? With all of the economic mishaps we’ve been seeing over the past decade, one could say it isn’t a particularly bad idea! Natural language processing could allow us to make more sense of the economy than we do currently. As it stands, investors and policymakers use index benchmarks and quantitative measures such as GDP growth to gauge economic health. That said, one potential application of NLP is to analyse text data (such as through major economic policy documents), and then ‘learn’ from such texts in order to generate appropriate economic policies independently of human intervention. In this example, an LSTM model is trained using text from a sample ECB policy document, in order to generate ‘new’ text data, with a view to revealing insights from such text that could be used for policy purposes. Specifically, a temperature hyperparameter is configured to control the randomness of text predictions generated, with the relevant text vectorized into sequences of characters, and the single-layer LSTM model then used for next character sampling – with a text generation loop then used to generate a block of text for each temperature (the higher the temperature, the more randomness induced in each block of text).

Neural Style Transfer

Style transfer is an excited sub-field of computer vision. It aims to transfer the style of one image onto another image, known as the content image. This technique allows us to synthesize new images combining the content and style of different images. Several developments have been made in this sub-field but the most notable initial work (neural style transfer) was done by Gatys et al. in 2015. Some of the results I got by applying this technique can be seen below.

Continue Reading…


Read More

Finding out why

Python Library: causalml

Python Package for Uplift Modeling and Causal Inference with Machine Learning Algorithms

Article: Correlation is not causation

Why the confusion of these concepts has profound implications, from healthcare to business management. In correlated data, a pair of variables are related in that one thing is likely to change when the other does. This relationship might lead us to assume that a change to one thing causes the change in the other. This article clarifies that kind of faulty thinking by explaining correlation, causation, and the bias that often lumps the two together. The human brain simplifies incoming information, so we can make sense of it. Our brains often do that by making assumptions about things based on slight relationships, or bias. But that thinking process isn’t foolproof. An example is when we mistake correlation for causation. Bias can make us conclude that one thing must cause another if both change in the same way at the same time. This article clears up the misconception that correlation equals causation by exploring both of those subjects and the human brain’s tendency toward bias.

Paper: A Groupwise Approach for Inferring Heterogeneous Treatment Effects in Causal Inference

There is a growing literature in nonparametric estimation of the conditional average treatment effect given a specific value of covariates. However, this estimate is often difficult to interpret if covariates are high dimensional and in practice, effect heterogeneity is discussed in terms of subgroups of individuals with similar attributes. The paper propose to study treatment heterogeneity under the groupwise framework. Our method is simple, only based on linear regression and sample splitting, and is semiparametrically efficient under assumptions. We also discuss ways to conduct multiple testing. We conclude by reanalyzing a get-out-the-vote experiment during the 2014 U.S. midterm elections.

Paper: Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking

Counterfactual thinking describes a psychological phenomenon that people re-infer the possible results with different solutions about things that have already happened. It helps people to gain more experience from mistakes and thus to perform better in similar future tasks. This paper investigates the counterfactual thinking for agents to find optimal decision-making strategies in multi-agent reinforcement learning environments. In particular, we propose a multi-agent deep reinforcement learning model with a structure which mimics the human-psychological counterfactual thinking process to improve the competitive abilities for agents. To this end, our model generates several possible actions (intent actions) with a parallel policy structure and estimates the rewards and regrets for these intent actions based on its current understanding of the environment. Our model incorporates a scenario-based framework to link the estimated regrets with its inner policies. During the iterations, our model updates the parallel policies and the corresponding scenario-based regrets for agents simultaneously. To verify the effectiveness of our proposed model, we conduct extensive experiments on two different environments with real-world applications. Experimental results show that counterfactual thinking can actually benefit the agents to obtain more accumulative rewards from the environments with fair information by comparing to their opponents while keeping high performing efficiency.

Paper: Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective

Can an arbitrarily intelligent reinforcement learning agent be kept under control by a human user? Or do agents with sufficient intelligence inevitably find ways to shortcut their reward signal? This question impacts how far reinforcement learning can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we use an intuitive yet precise graphical model called causal influence diagrams to formalize reward tampering problems. We also describe a number of tweaks to the reinforcement learning objective that prevent incentives for reward tampering. We verify the solutions using recently developed graphical criteria for inferring agent incentives from causal influence diagrams.

Paper: Optimal Estimation of Generalized Average Treatment Effects using Kernel Optimal Matching

In causal inference, a variety of causal effect estimands have been studied, including the sample, uncensored, target, conditional, optimal subpopulation, and optimal weighted average treatment effects. Ad-hoc methods have been developed for each estimand based on inverse probability weighting (IPW) and on outcome regression modeling, but these may be sensitive to model misspecification, practical violations of positivity, or both. The contribution of this paper is twofold. First, we formulate the generalized average treatment effect (GATE) to unify these causal estimands as well as their IPW estimates. Second, we develop a method based on Kernel Optimal Matching (KOM) to optimally estimate GATE and to find the GATE most easily estimable by KOM, which we term the Kernel Optimal Weighted Average Treatment Effect. KOM provides uniform control on the conditional mean squared error of a weighted estimator over a class of models while simultaneously controlling for precision. We study its theoretical properties and evaluate its comparative performance in a simulation study. We illustrate the use of KOM for GATE estimation in two case studies: comparing spine surgical interventions and studying the effect of peer support on people living with HIV.

Continue Reading…


Read More

Course Announcement: Data Mining (36-462/662), Fall 2019

For the first time in ten years, I find myself teaching data mining in the fall. This means I need to figure out what data mining is in 2019. Naturally, my first stab at a syllabus is based on what I thought data mining was in 2009. Perhaps it's changed too little; nonetheless, I'm feeling OK with it at the moment*. I am sure the thoughtful and constructive suggestions of the Internet will only reinforce this satisfaction.

--- Seriously, suggestions are welcome, except for suggesting that I teach about neural networks, which I deliberately omitted because I am an out-of-date stick-in-the-mud reasons**.

*: Though I am not done selecting readings from the textbook, the recommended books, and sundry articles --- those will however come before the respective classes. I have been teaching long enough to realize that most students, particularly in a class like this, will read just enough of the most emphatically required material to think they know how to do the assignments, but there are exceptions, and anecdotally even some of thoe majority come back to the material later, and benefit from pointers. ^

**: On the one hand, CMU (now) has plenty of well-attended classes on neural networks and deep learning, so what would one more add? On the other, my admittedly cranky opinion is that we have no idea why the new crop works better than the 1990s version, and it's not always clear that they do work better than good old-fashioned machine learning, so there.

Corrupting the Young; Enigmas of Chance

Continue Reading…


Read More

Thanks for reading!