My Data Science Blogs

September 15, 2019

Whats new on arXiv – Complete List

Distributionally Robust Language Modeling
Performance Analysis and Comparison of Distributed Machine Learning Systems
Accelerated Information Gradient flow
Meta Learning with Relational Information for Short Sequences
Quantum Natural Gradient
Theory of high-dimensional outliers
TabFact: A Large-scale Dataset for Table-based Fact Verification
Detecting Deep Neural Network Defects with Data Flow Analysis
Using Wasserstein Generative Adversial Networks for the Design of Monte Carlo Simulations
An Experiment on Network Density and Sequential Learning
Effective Domain Knowledge Transfer with Soft Fine-tuning
A new reproducing kernel based nonlinear dimension reduction method for survival data
Powerset Convolutional Neural Networks
Table-to-Text Generation with Effective Hierarchical Encoder on Three Dimensions (Row, Column and Time)
Human-AI Collaboration in Data Science: Exploring Data Scientists’ Perceptions of Automated AI
Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis
The application of Convolutional Neural Networks to Detect Slow, Sustained Deformation in InSAR Timeseries
Informative and Controllable Opinion Summarization
Understanding ML driven HPC: Applications and Infrastructure
Best Practices for Scientific Research on Neural Architecture Search
Minibatch Processing in Spiking Neural Networks
Regression-clustering for Improved Accuracy and Training Cost with Molecular-Orbital-BasedMachine Learning
The smallest matroids with no large independent flat
Mixed spectra and partially extended states in a two-dimensional quasiperiodic model
TIGEr: Text-to-Image Grounding for Image Caption Evaluation
ModiPick: SLA-aware Accuracy Optimization For Mobile Deep Inference
Large deviations and gradient flows for the Brownian one-dimensional hard-rod system
Failed power domination on graphs
Bayesian Inference of Networks Across Multiple Sample Groups and Data Types
An Entity-Driven Framework for Abstractive Summarization
DCGANs for Realistic Breast Mass Augmentation in X-ray Mammography
Diversity Breeds Innovation With Discounted Impact and Recognition
Leverage Implicit Feedback for Context-aware Product Search
Correct-by-construction: a contract-based semi-automated requirement decomposition process
Conversational Product Search Based on Negative Feedback
Large-scale Tag-based Font Retrieval with Generative Feature Learning
Jointly Learning to Align and Translate with Transformer Models
Weakly Supervised Universal Fracture Detection in Pelvic X-rays
Phase retrieval of complex and vector-valued functions
Thermoplastic Fiber Reinforced Composite Material Characterization and Precise Finite Element Analysis for 4D Printing
On Least Squares Estimation under Heteroscedastic and Heavy-Tailed Errors
An algebraic inverse theorem for the quadratic Littlewood-Offord problem, and an application to Ramsey graphs
Correct, Fast Remote Persistence
Gradients of Generative Models for Improved Discriminative Analysis of Tandem Mass Spectra
Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering
Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems
Stochastic Linear Optimization with Adversarial Corruption
Program-Guided Image Manipulators
Learning Dynamic Context Augmentation for Global Entity Linking
Inductive Bias-driven Reinforcement Learning For Efficient Schedules in Heterogeneous Clusters
A Critical Domain For the First Normalized Nontrivial Steklov Eigenvalue Among Planar Annular Domains
Finding the dimension of a non-empty orthogonal array polytope
PFLOTRAN-SIP: A PFLOTRAN Module for Simulating Spectral-Induced Polarization of Electrical Impedance Data
Reporting the Unreported: Event Extraction for Analyzing the Local Representation of Hate Crimes
Fast BFS-Based Triangle Counting on GPUs
No Press Diplomacy: Modeling Multi-Agent Gameplay
Towards Precise Robotic Grasping by Probabilistic Post-grasp Displacement Estimation
PaLM: A Hybrid Parser and Language Model
Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra
Vibration Analysis of Geometrically Nonlinear and Fractional Viscoelastic Cantilever Beams
Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning
Drone-Assisted Communications for Remote Areas and Disaster Relief
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
Feasibility criteria for high-multiplicity partitioning problems
Bidding Strategies with Gender Nondiscrimination: Constraints for Online Ad Auctions
Atypical Facial Landmark Localisation with Stacked Hourglass Networks: A Study on 3D Facial Modelling for Medical Diagnosis
Free resolutions of function classes via order complexes
Poly-GAN: Multi-Conditioned GAN for Fashion Synthesis
Future Frame Prediction Using Convolutional VRNN for Anomaly Detection
Estimating a novel stochastic model for within-field disease dynamics of banana bunchy top virus via approximate Bayesian computation
Neural Rule Grounding for Low-Resource Relation Extraction
Learning from Label Proportions with Generative Adversarial Networks
Machine Learning in Least-Squares Monte Carlo Proxy Modeling of Life Insurance Companies
More Adaptive Algorithms for Tracking the Best Expert
A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding
Non-asymptotic Closed-Loop System Identification using Autoregressive Processes and Hankel Model Reduction
Analysis and Optimization of Outage Probability in Multi-Intelligent Reflecting Surface-Assisted Systems
An Outage Probability Analysis of Full-Duplex NOMA in UAV Communications
Automated Let’s Play Commentary
Investigating Multilingual NMT Representations at Scale
Efficient Optimal Planning in non-FIFO Time-Dependent Flow Fields
Towards a general model for psychopathology
Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach
Low-rank representation of tensor network operators with long-range pairwise interactions
Semantics-aware BERT for Language Understanding
Gravity as a Reference for Estimating a Person’s Height from Video
Author Growth Outstrips Publication Growth in Computer Science and Publication Quality Correlates with Collaboration
Training Compact Neural Networks via Auxiliary Overparameterization
Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks
REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning
A Better Way to Attend: Attention with Trees for Video Question Answering
Super-resolved Chromatic Mapping of Snapshot Mosaic Image Sensors via a Texture Sensitive Residual Network
Multi-Granularity Self-Attention for Neural Machine Translation
Examining Gender Bias in Languages with Grammatical Gender
POD: Practical Object Detection with Scale-Sensitive Network
On the compatibility between the adiabatic and the rotating wave approximations in quantum control
Lyapunov modal analysis and participation factors with applications to small-signal stability of power systems
Optimal UCB Adjustments for Large Arm Sizes
Technical note: Hybrid Loewner Data Driven Control
A Fourth-Order Compact ADI Scheme for Two-Dimensional Riesz Space Fractional Nonlinear Reaction-Diffusion Equation
Cross-Lingual Dependency Parsing Using Code-Mixed TreeBank
Adaptive Graph Representation Learning for Video Person Re-identification
Robust Navigation with Language Pretraining and Stochastic Sampling
On Validity of Reed Conjecture for Classes of Graphs with Two Forbidden Subgraphs
Generative Machine Learning for Robust Free-Space Communication
Nested Named Entity Recognition via Second-best Sequence Learning and Decoding
An Arm-wise Randomization Approach to Combinatorial Linear Semi-bandits
Scalable Double Regularization for 3D Nano-CT Reconstruction
Gradient Descent based Weight Learning for Grouping Problems: Application on Graph Coloring and Equitable Graph Coloring
How effective is machine learning to detect long transient gravitational waves from neutron stars in a real search?
Towards Task-Oriented Dialogue in Mixed Domains
Semi-conical eigenvalue intersections and the ensemble controllability problem for quantum systems
Source Dependency-Aware Transformer with Supervised Self-Attention
Finding optimal hull shapes for fast vertical penetration into water
Integrability approach to Feher-Nemethi-Rimanyi-Guo-Sun type identities for factorial Grothendieck polynomials
Accelerating Transformer Decoding via a Hybrid of Self-attention and Recurrent Neural Network
Convex semigroups on Banach lattices
Reduced-bias estimation of spatial econometric models with incompletely geocoded data
Analysis of switching strategies for the optimization of periodic chemical reactions with controlled flow-rate
Multiple Lattice Rules for Multivariate $L_\infty$ Approximation in the Worst-Case Setting
Learning Action-Transferable Policy with Action Embedding
Efficient Neural Architecture Transformation Searchin Channel-Level for Object Detection
Depth Map Estimation for Free-Viewpoint Television
Beyond integrated information: A taxonomy of information dynamics phenomena
Detector With Focus: Normalizing Gradient In Image Pyramid
Topological recursion for monotone orbifold Hurwitz numbers: a proof of the Do-Karev conjecture
Further study on inferential aspects of log-Lindley distribution with an application of stress-strength reliability in insurance
State sums for some super quantum link invariants
Fusing Vector Space Models for Domain-Specific Applications
A non-$P$-stable class of degree sequences for which the swap Markov chain is rapidly mixing
New expressions for order polynomials and chromatic polynomials
McDiarmid-Type Inequalities for Graph-Dependent Variables and Stability Bounds
Frieze patterns with coefficients
Occ-Traj120: Occupancy Maps with Associated Trajectories
A simple parallelizable method for the approximate solution of a quadratic transportation problem of large dimension with additional constraints
The Nonlocal Ramsey Model for an Interacting Economy
Informing Unsupervised Pretraining with External Linguistic Knowledge
Kernel absolute summability is only sufficient for RKHS stability
An Active Learning Approach for Reducing Annotation Cost in Skin Lesion Analysis
A complex network approach to political analysis: application to the Brazilian Chamber of Deputies
Competition Models for Plant Stems
A Nonlocal Spatial Ramsey Model with Endogenous Productivity Growth on Unbounded Spatial Domains
A Transfer Learning Approach for Network Intrusion Detection
Sticky matroids and convolution
Tensor Oriented No-Reference Light Field Image Quality Assessment
Hierarchical Federated Learning Across Heterogeneous Cellular Networks
A Simple Reduction for Full-Permuted Pattern Matching Problems on Multi-Track Strings
LSMI-Sinkhorn: Semi-supervised Squared-Loss Mutual Information Estimation with Optimal Transport
Vector-valued Generalised Ornstein-Uhlenbeck Processes
Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers
Machine-Learning-Driven New Geologic Discoveries at Mars Rover Landing Sites: Jezero and NE Syrtis
Rewarding Coreference Resolvers for Being Consistent with World Knowledge
Reading Comprehension Ability Test-A Turing Test for Reading Comprehension
On ultrametric $1$-median selection
Elastic interior transmission eigenvalues and their computation via the method of fundamental solutions
Predictive Claim Scores for Dynamic Multi-Product Risk Classification in Insurance
Utilizing Temporal Information in DeepConvolutional Network for Efficient Soccer BallDetection and Tracking
Semantic-Aware Scene Recognition
PAPR Reduction with Mixed-Numerology OFDM
The phaseless rank of a matrix
Swap Stability in Schelling Games on Graphs
Remembering winter was coming
Regression Models Using Shapes of Functions as Predictors
Furstenberg sets in finite fields: Explaining and improving the Ellenberg-Erman proof
Self-organizing memristive nanowire networks with structural plasticity emulate biological neuronal circuits
Empirical Notes on the Interaction Between Continuous Kernel Fuzzing and Development
FreeAnchor: Learning to Match Anchors for Visual Object Detection
Intrinsic Dynamic Shape Prior for Fast, Sequential and Dense Non-Rigid Structure from Motion with Detection of Temporally-Disjoint Rigidity
Free flags over local rings and powering of high dimensional expanders
Lower bound performances for average consensus in open multi-agent systems (extended version)
A Discussion on Influence of Newspaper Headlines on Social Media
AFP-Net: Realtime Anchor-Free Polyp Detection in Colonoscopy
Playing Games with Multiple Access Channels
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow
Gradient-Based STL Control with Application to Nonholonomic Systems
Enabling UAV communications with cell-free massive MIMO
Ab-Initio Solution of the Many-Electron Schrödinger Equation with Deep Neural Networks
Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation
Reply to ‘Issues arising from benchmarking single-cell RNA sequencing imputation methods’
Permutation Recovery from Multiple Measurement Vectors in Unlabeled Sensing
Predictive distributions that mimic frequencies over a restricted subdomain (expanded preprint version)
$\sqrt{n}$-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank
CT Data Curation for Liver Patients: Phase Recognition in Dynamic Contrast-Enhanced CT
Dual Frequency Comb Assisted Analog-to-Digital Conversion of Subcarrier Modulated Signals
Straggler Mitigation with Tiered Gradient Codes
Cosmological Polytopes and the Wavefuncton of the Universe for Light States
Neural Style-Preserving Visual Dubbing
On the discriminative power of Hyper-parameters in Cross-Validation and how to choose them
Coherent Optical Communications Enhanced by Machine Intelligence
Dispersion Characterization and Pulse Prediction with Machine Learning
Analyzing Brain Circuits in Population Neuroscience: A Case to Be a Bayesian
Latent Multivariate Log-Gamma Models for High-Dimensional Multi-Type Responses with Application to Daily Fine Particulate Matter and Mortality Counts
C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion
Harnessing the Power of Deep Learning Methods in Healthcare: Neonatal Pain Assessment from Crying Sound
The distribution of Yule’s ‘nonsense correlation’
Quasi-optimal adaptive hybridized mixed finite element methods for linear elasticity
Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes
Number of Sign Changes: Segment of AR(1)
Adversarial Examples with Difficult Common Words for Paraphrase Identification

Continue Reading…


Read More

Whats new on arXiv

Distributionally Robust Language Modeling

Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that training on text outside the test distribution can degrade test performance when using standard maximum likelihood (MLE) training. To remedy this without the knowledge of the test distribution, we propose an approach which trains a model that performs well over a wide range of potential test distributions. In particular, we derive a new distributionally robust optimization (DRO) procedure which minimizes the loss of the model over the worst-case mixture of topics with sufficient overlap with the training distribution. Our approach, called topic conditional value at risk (topic CVaR), obtains a 5.5 point perplexity reduction over MLE when the language models are trained on a mixture of Yelp reviews and news and tested only on reviews.

Performance Analysis and Comparison of Distributed Machine Learning Systems

Deep learning has permeated through many aspects of computing/processing systems in recent years. While distributed training architectures/frameworks are adopted for training large deep learning models quickly, there has not been a systematic study of the communication bottlenecks of these architectures and their effects on the computation cycle time and scalability. In order to analyze this problem for synchronous Stochastic Gradient Descent (SGD) training of deep learning models, we developed a performance model of computation time and communication latency under three different system architectures: Parameter Server (PS), peer-to-peer (P2P), and Ring allreduce (RA). To complement and corroborate our analytical models with quantitative results, we evaluated the computation and communication performance of these system architectures of the systems via experiments performed with Tensorflow and Horovod frameworks. We found that the system architecture has a very significant effect on the performance of training. RA-based systems achieve scalable performance as they successfully decouple network usage from the number of workers in the system. In contrast, 1PS systems suffer from low performance due to network congestion at the parameter server side. While P2P systems fare better than 1PS systems, they still suffer from significant network bottleneck. Finally, RA systems also excel by virtue of overlapping computation time and communication time, which PS and P2P architectures fail to achieve.

Accelerated Information Gradient flow

We present a systematic framework for the Nesterov’s accelerated gradient flows in the spaces of probabilities embedded with information metrics. Here two metrics are considered, including both the Fisher-Rao metric and the Wasserstein-2 metric. For the Wasserstein-2 metric case, we prove the convergence properties of the accelerated gradient flows, and introduce their formulations in Gaussian families. Furthermore, we propose a practical discrete-time algorithm in particle implementations with an adaptive restart technique. We formulate a novel bandwidth selection method, which learns the Wasserstein-2 gradient direction from Brownian-motion samples. Experimental results including Bayesian inference show the strength of the current method compared with the state-of-the-art.

Meta Learning with Relational Information for Short Sequences

This paper proposes a new meta-learning method — named HARMLESS (HAwkes Relational Meta LEarning method for Short Sequences) for learning heterogeneous point process models from short event sequence data along with a relational network. Specifically, we propose a hierarchical Bayesian mixture Hawkes process model, which naturally incorporates the relational information among sequences into point process modeling. Compared with existing methods, our model can capture the underlying mixed-community patterns of the relational network, which simultaneously encourages knowledge sharing among sequences and facilitates adaptive learning for each individual sequence. We further propose an efficient stochastic variational meta expectation maximization algorithm that can scale to large problems. Numerical experiments on both synthetic and real data show that HARMLESS outperforms existing methods in terms of predicting the future events.

Quantum Natural Gradient

A quantum generalization of Natural Gradient Descent is presented as part of a general-purpose optimization framework for variational quantum circuits. The optimization dynamics is interpreted as moving in the steepest descent direction with respect to the Quantum Information Geometry, corresponding to the real part of the Quantum Geometric Tensor (QGT), also known as the Fubini-Study metric tensor. An efficient algorithm is presented for computing a block-diagonal approximation to the Fubini-Study metric tensor for parametrized quantum circuits, which may be of independent interest.

Theory of high-dimensional outliers

This study concerns the issue of high dimensional outliers which are challenging to distinguish from inliers due to the special structure of high dimensional space. We introduce a new notion of high dimensional outliers that embraces various types and provides deep insights into understanding the behavior of these outliers based on several asymptotic regimes. Our study of geometrical properties of high dimensional outliers reveals an interesting transition phenomenon of outliers from near the surface of a high dimensional sphere to being distant from the sphere. Also, we study the PCA subspace consistency when data contain a limited number of outliers.

TabFact: A Large-scale Dataset for Table-based Fact Verification

The problem of verifying whether a textual hypothesis holds the truth based on the given evidence, also known as fact verification, plays an important role in the study of natural language understanding and semantic representation. However, existing studies are mainly restricted to dealing with unstructured evidence (e.g., natural language sentences and documents, news, etc), while verification under structured evidence, such as tables, graphs, and databases, remains unexplored. This paper specifically aims to study the fact verification given semi-structured data as evidence. To this end, we construct a large-scale dataset called \textsc{TabFact} with 16k Wikipedia tables as evidence for 118k human-annotated natural language statements, which are labeled as either {\tt ENTAILED} or {\tt REFUTED}. \textsc{TabFact} is more challenging since it involves both soft linguistic reasoning and hard symbolic reasoning. To address these reasoning challenges, we design two different models: Table-BERT and Latent Program Algorithm (LPA). Table-BERT leverages the state-of-the-art pre-trained language model to encode the linearized tables and statements into continuous vectors for verification. LPA parses statements into LISP-like programs and executes them against the tables to obtain the returned binary value. Both methods achieve similar accuracy but yet far from human performance. We also perform comprehensive analysis and demonstrate great future opportunities. The data and code of the dataset are provided in \url{https://…/Table-Fact-Checking}.

Detecting Deep Neural Network Defects with Data Flow Analysis

Deep neural networks (DNNs) are shown to be promising solutions in many challenging artificial intelligence tasks, including object recognition, natural language processing, and even unmanned driving. A DNN model, generally based on statistical summarization of in-house training data, aims to predict correct output given an input encountered in the wild. In general, 100% precision is therefore impossible due to its probabilistic nature. For DNN practitioners, it is very hard, if not impossible, to figure out whether the low precision of a DNN model is an inevitable result, or caused by defects such as bad network design or improper training process. This paper aims at addressing this challenging problem. We approach with a careful categorization of the root causes of low precision. We find that the internal data flow footprints of a DNN model can provide insights to locate the root cause effectively. We then develop a tool, namely, DeepMorph (DNN Tomography) to analyze the root cause, which can instantly guide a DNN developer to improve the model. Case studies on four popular datasets show the effectiveness of DeepMorph.

Using Wasserstein Generative Adversial Networks for the Design of Monte Carlo Simulations

Researchers often use artificial data to assess the performance of new econometric methods. In many cases the data generating processes used in these Monte Carlo studies do not resemble real data sets and instead reflect many arbitrary decisions made by the researchers. As a result potential users of the methods are rarely persuaded by these simulations that the new methods are as attractive as the simulations make them out to be. We discuss the use of Wasserstein Generative Adversarial Networks (WGANs) as a method for systematically generating artificial data that mimic closely any given real data set without the researcher having many degrees of freedom. We apply the methods to compare in three different settings twelve different estimators for average treatment effects under unconfoundedness. We conclude in this example that (i) there is not one estimator that outperforms the others in all three settings, and (ii) that systematic simulation studies can be helpful for selecting among competing methods.

An Experiment on Network Density and Sequential Learning

We conduct a sequential social learning experiment where subjects guess a hidden state after observing private signals and the guesses of a subset of their predecessors. A network determines the observable predecessors, and we compare subjects’ accuracy on sparse and dense networks. Later agents’ accuracy gains from social learning are twice as large in the sparse treatment compared to the dense treatment. Models of naive inference where agents ignore correlation between observations predict this comparative static in network density, while the result is difficult to reconcile with rational-learning models.

Effective Domain Knowledge Transfer with Soft Fine-tuning

Convolutional neural networks require numerous data for training. Considering the difficulties in data collection and labeling in some specific tasks, existing approaches generally use models pre-trained on a large source domain (e.g. ImageNet), and then fine-tune them on these tasks. However, the datasets from source domain are simply discarded in the fine-tuning process. We argue that the source datasets could be better utilized and benefit fine-tuning. This paper firstly introduces the concept of general discrimination to describe ability of a network to distinguish untrained patterns, and then experimentally demonstrates that general discrimination could potentially enhance the total discrimination ability on target domain. Furthermore, we propose a novel and light-weighted method, namely soft fine-tuning. Unlike traditional fine-tuning which directly replaces optimization objective by a loss function on the target domain, soft fine-tuning effectively keeps general discrimination by holding the previous loss and removes it softly. By doing so, soft fine-tuning improves the robustness of the network to data bias, and meanwhile accelerates the convergence. We evaluate our approach on several visual recognition tasks. Extensive experimental results support that soft fine-tuning provides consistent improvement on all evaluated tasks, and outperforms the state-of-the-art significantly. Codes will be made available to the public.

A new reproducing kernel based nonlinear dimension reduction method for survival data

Based on the theories of sliced inverse regression (SIR) and reproducing kernel Hilbert space (RKHS), a new approach RDSIR (RKHS-based Double SIR) to nonlinear dimension reduction for survival data is proposed and discussed. An isometrically isomorphism is constructed based on RKHS property, then the nonlinear function in the RKHS can be represented by the inner product of two elements which reside in the isomorphic feature space. Due to the censorship of survival data, double slicing is used to estimate weight function or conditional survival function to adjust for the censoring bias. The sufficient dimension reduction (SDR) subspace is estimated by a generalized eigen-decomposition problem. Our method is computationally efficient with fast calculation speed and small computational burden. The asymptotic property and the convergence rate of the estimator are also discussed based on the perturbation theory. Finally, we illustrate the performance of RDSIR on simulated and real data to confirm that RDSIR is comparable with linear SDR method. The most important is that RDSIR can also extract nonlinearity in survival data effectively.

Powerset Convolutional Neural Networks

We present a novel class of convolutional neural networks (CNNs) for set functions, i.e., data indexed with the powerset of a finite set. The convolutions are derived as linear, shift-equivariant functions for various notions of shifts on set functions. The framework is fundamentally different from graph convolutions based on the Laplacian, as it provides not one but several basic shifts, one for each element in the ground set. Prototypical experiments with several set function classification tasks on synthetic datasets and on datasets derived from real-world hypergraphs demonstrate the potential of our new powerset CNNs.

Table-to-Text Generation with Effective Hierarchical Encoder on Three Dimensions (Row, Column and Time)

Although Seq2Seq models for table-to-text generation have achieved remarkable progress, modeling table representation in one dimension is inadequate. This is because (1) the table consists of multiple rows and columns, which means that encoding a table should not depend only on one dimensional sequence or set of records and (2) most of the tables are time series data (e.g. NBA game data, stock market data), which means that the description of the current table may be affected by its historical data. To address aforementioned problems, not only do we model each table cell considering other records in the same row, we also enrich table’s representation by modeling each table cell in context of other cells in the same column or with historical (time dimension) data respectively. In addition, we develop a table cell fusion gate to combine representations from row, column and time dimension into one dense vector according to the saliency of each dimension’s representation. We evaluated our methods on ROTOWIRE, a benchmark dataset of NBA basketball games. Both automatic and human evaluation results demonstrate the effectiveness of our model with improvement of 2.66 in BLEU over the strong baseline and outperformance of state-of-the-art model.

Human-AI Collaboration in Data Science: Exploring Data Scientists’ Perceptions of Automated AI

The rapid advancement of artificial intelligence (AI) is changing our lives in many ways. One application domain is data science. New techniques in automating the creation of AI, known as AutoAI or AutoML, aim to automate the work practices of data scientists. AutoAI systems are capable of autonomously ingesting and pre-processing data, engineering new features, and creating and scoring models based on a target objectives (e.g. accuracy or run-time efficiency). Though not yet widely adopted, we are interested in understanding how AutoAI will impact the practice of data science. We conducted interviews with 20 data scientists who work at a large, multinational technology company and practice data science in various business settings. Our goal is to understand their current work practices and how these practices might change with AutoAI. Reactions were mixed: while informants expressed concerns about the trend of automating their jobs, they also strongly felt it was inevitable. Despite these concerns, they remained optimistic about their future job security due to a view that the future of data science work will be a collaboration between humans and AI systems, in which both automation and human expertise are indispensable.

Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis

We describe a detailed analysis of a sample of large benchmark of commonsense reasoning problems that has been automatically obtained from WordNet, SUMO and their mapping. The objective is to provide a better assessment of the quality of both the benchmark and the involved knowledge resources for advanced commonsense reasoning tasks. By means of this analysis, we are able to detect some knowledge misalignments, mapping errors and lack of knowledge and resources. Our final objective is the extraction of some guidelines towards a better exploitation of this commonsense knowledge framework by the improvement of the included resources.

The application of Convolutional Neural Networks to Detect Slow, Sustained Deformation in InSAR Timeseries

Automated systems for detecting deformation in satellite InSAR imagery could be used to develop a global monitoring system for volcanic and urban environments. Here we explore the limits of a CNN for detecting slow, sustained deformations in wrapped interferograms. Using synthetic data, we estimate a detection threshold of 3.9cm for deformation signals alone, and 6.3cm when atmospheric artefacts are considered. Over-wrapping reduces this to 1.8cm and 5.0cm respectively as more fringes are generated without altering SNR. We test the approach on timeseries of cumulative deformation from Campi Flegrei and Dallol, where over-wrapping improves classication performance by up to 15%. We propose a mean-filtering method for combining results of different wrap parameters to flag deformation. At Campi Flegrei, deformation of 8.5cm/yr was detected after 60days and at Dallol, deformation of 3.5cm/yr was detected after 310 days. This corresponds to cumulative displacements of 3 cm and 4 cm consistent with estimates based on synthetic data.

Informative and Controllable Opinion Summarization

Opinion summarization is the task of automatically generating summaries for a set of opinions about a specific target (e.g., a movie or a product). Since the number of input documents can be prohibitively large, neural network-based methods sacrifice end-to-end elegance and follow a two-stage approach where an extractive model first pre-selects a subset of salient opinions and an abstractive model creates the summary while conditioning on the extracted subset. However, the extractive stage leads to information loss and inflexible generation capability. In this paper we propose a summarization framework that eliminates the need to pre-select salient content. We view opinion summarization as an instance of multi-source transduction, and make use of all input documents by condensing them into multiple dense vectors which serve as input to an abstractive model. Beyond producing more informative summaries, we demonstrate that our approach allows to take user preferences into account based on a simple zero-shot customization technique. Experimental results show that our model improves the state of the art on the Rotten Tomatoes dataset by a wide margin and generates customized summaries effectively.

Understanding ML driven HPC: Applications and Infrastructure

We recently outlined the vision of ‘Learning Everywhere’ which captures the possibility and impact of how learning methods and traditional HPC methods can be coupled together. A primary driver of such coupling is the promise that Machine Learning (ML) will give major performance improvements for traditional HPC simulations. Motivated by this potential, the ML around HPC class of integration is of particular significance. In a related follow-up paper, we provided an initial taxonomy for integrating learning around HPC methods. In this paper, which is part of the Learning Everywhere series, we discuss ‘how’ learning methods and HPC simulations are being integrated to enhance effective performance of computations. This paper identifies several modes — substitution, assimilation, and control, in which learning methods integrate with HPC simulations and provide representative applications in each mode. This paper discusses some open research questions and we hope will motivate and clear the ground for MLaroundHPC benchmarks.

Best Practices for Scientific Research on Neural Architecture Search

We describe a set of best practices for the young field of neural architecture search (NAS), which lead to the best practices checklist for NAS available at http://…/nas_checklist.pdf.

Minibatch Processing in Spiking Neural Networks

Spiking neural networks (SNNs) are a promising candidate for biologically-inspired and energy efficient computation. However, their simulation is notoriously time consuming, and may be seen as a bottleneck in developing competitive training methods with potential deployment on neuromorphic hardware platforms. To address this issue, we provide an implementation of mini-batch processing applied to clock-based SNN simulation, leading to drastically increased data throughput. To our knowledge, this is the first general-purpose implementation of mini-batch processing in a spiking neural networks simulator, which works with arbitrary neuron and synapse models. We demonstrate nearly constant-time scaling with batch size on a simulation setup (up to GPU memory limits), and showcase the effectiveness of large batch sizes in two SNN application domains, resulting in \approx880X and \approx24X reductions in wall-clock time respectively. Different parameter reduction techniques are shown to produce different learning outcomes in a simulation of networks trained with spike-timing-dependent plasticity. Machine learning practitioners and biological modelers alike may benefit from the drastically reduced simulation time and increased iteration speed this method enables. Code to reproduce the benchmarks and experimental findings in this paper can be found at https://…/snn-minibatch.

Continue Reading…


Read More

“Boston Globe Columnist Suspended During Investigation Of Marathon Bombing Stories That Don’t Add Up”

I came across this news article by Samer Kalaf and it made me think of some problems we’ve been seeing in recent years involving cargo-cult science.

Here’s the story:

The Boston Globe has placed columnist Kevin Cullen on “administrative leave” while it conducts a review of his work, after WEEI radio host Kirk Minihane scrutinized Cullen’s April 14 column about the five-year anniversary of the Boston Marathon bombings, and found several inconsistencies. . . .

Here’s an excerpt of the column:

I happened upon a house fire recently, in Mattapan, and the smell reminded me of Boylston Street five years ago, when so many lost their lives and their limbs and their sense of security.

I can smell Patriots Day, 2013. I can hear it. God, can I hear it, whenever multiple fire engines or ambulances are racing to a scene.

I can taste it, when I’m around a campfire and embers create a certain sensation.

I can see it, when I bump into survivors, which happens with more regularity than I could ever have imagined. And I can touch it, when I grab those survivors’ hands or their shoulders.

Cullen, who was part of the paper’s 2003 Pulitzer-winning Spotlight team that broke the stories on the Catholic Church sex abuse scandal, had established in this column, and in prior reporting, that he was present for the bombings. . . .

But Cullen wasn’t really there. And his stories had lots of details that sounded good but were actually made up. Including, horrifyingly enough, made-up stories about a little girl who was missing her leg.

OK, so far, same old story. Mike Barnicle, Janet Cooke, Stephen Glass, . . . and now one more reporter who prefers to make things up than to do actual reporting. For one thing, making stuff up is easier; for another, if you make things up, you can make the story work better, as you’re not constrained by pesky details.

What’s the point of writing about this, then? What’s the connection to statistical modeling, causal inference, and social science?

Here’s the point:

Let’s think about journalism:

1. What’s the reason for journalism? To convey information, to give readers a different window into reality. To give a sense of what it was like to be there, for those who were not there. Or to help people who were there, to remember.

2. What does good journalism look like? It’s typically emotionally stirring and convincingly specific.

And here’s the problem.

The reason for journalism is 1, but some journalists decide to take a shortcut and go straight to the form of good journalism, that is, 2.

Indeed, I suspect that many journalists think that 2 is the goal, and that 1 is just some old-fashioned traditional attitude.

Now, to connect to statistical modeling, causal inference, and social science . . . let’s think about science:

1. What’s the reason for science? To learn about reality, to learn new facts, to encompass facts into existing and new theories, to find flaws in our models of the world.

2. And what does good science look like? It typically has an air of rigor.

And here’s the problem.

The reason for science is 1, but some scientists decide to take a shortcut and go straight to the form of good science, that is, 2.

The problem is not scientists don’t care about the goal of learning about reality; the problem is that they think that if they follow various formal expressions of science (randomized experiments, p-values, peer review, publication in journals, association with authority figures, etc.) that they’ll get the discovery for free.

It’s a natural mistake, given statistical training with its focus on randomization and p-values, an attitude that statistical methods can yield effective certainty from noisy data (true for Las Vegas casinos where the probability model is known; not so true for messy real-world science experiments), and scientific training that’s focused on getting papers published.


What struck me about the above-quoted Boston Globe article (“I happened upon a house fire recently . . . I can smell Patriots Day, 2013. I can hear it. God, can I hear it . . . I can taste it . . .”) was how it looks like good journalism. Not great journalism—it’s too clichéd and trope-y for that—but what’s generally considered good reporting, the kind that sometimes wins awards.

Similarly, if you look at a bunch of the fatally flawed articles we’ve seen in science journals in the past few years, they look like solid science. It’s only when you examine the details that you start seeing all the problems, and these papers disintegrate like a sock whose thread has been pulled.

Ok, yeah yeah sure, you’re saying: Once again I’m reminded of bad science. Who cares? I care, because bad science Greshams good science in so many ways: in scientists’ decision of what to work on and publish (why do a slow careful study if you can get a better publication with something flashy?), in who gets promoted and honored and who decides to quit the field in disgust (not always, but sometimes), and in what gets publicized. The above Boston marathon story struck me because it had that same flavor.

P.S. Tomorrow’s post: Harking, Sharking, Tharking.

Continue Reading…


Read More

pinp 0.0.9: Real Fix and Polish

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Another pinp package release! pinp allows for snazzier one or two column Markdown-based pdf vignettes, and is now used by a few packages. A screenshot of the package vignette can be seen below. Additional screenshots are at the pinp page.

pinp vignette

This release comes exactly one week (i.e. the minimal time to not earn a NOTE) after the hot-fix release 0.0.8 which addressed breakage on CRAN tickled by changed in TeX Live. After updating the PNAS style LaTeX macros, and avoiding the issue with an (older) custom copy of titlesec, we now have the real fix, thanks to the eagle-eyed attention of Javier Bezos. The error, as so often, was simple and ours: we had left a stray \makeatother in pinp.cls where it may have been in hiding for a while. A very big Thank You! to Javier for spotting it, to Norbert for all his help and to James for double-checking on PNAS.

The good news in all of this is that the package is now in better shape than ever. The newer PNAS style works really well, and I went over a few of our extensions (such as papersize support for a4 as well as letter), direct on/off off a Draft watermark, a custom subtitle and more—and they all work consistently. So happy vignette or paper writing!

The NEWS entry for this release follows.

Changes in pinp version 0.0.9 (2019-09-15)

  • The processing error first addressed in release 0.0.8 is now fixed by removing one stray command; many thanks to Javier Bezos.

  • The hotfix of also installing titlesec.sty has been reverted.

  • Processing of the ‘papersize’ and ‘watermark’ options was updated.

Courtesy of CRANberries, there is a comparison to the previous release. More information is on the tint page. For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box . offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Magister Dixit

“Big Data is not about volume, size or velocity of data – neither of which are easily translated into financial results for most businesses. It is about the integration of external sources of information and unstructured data into a company’s IT infrastructure and business processes.” Gregory Yankelovich ( October 20, 2014 )

Continue Reading…


Read More

Finding out why

Paper: Unifying Causal Models with Trek Rules

In many scientific contexts, different investigators experiment with or observe different variables with data from a domain in which the distinct variable sets might well be related. This sort of fragmentation sometimes occurs in molecular biology, whether in studies of RNA expression or studies of protein interaction, and it is common in the social sciences. Models are built on the diverse data sets, but combining them can provide a more unified account of the causal processes in the domain. On the other hand, this problem is made challenging by the fact that a variable in one data set may influence variables in another although neither data set contains all of the variables involved. Several authors have proposed using conditional independence properties of fragmentary (marginal) data collections to form unified causal explanations when it is assumed that the data have a common causal explanation but cannot be merged to form a unified dataset. These methods typically return a large number of alternative causal models. The first part of the thesis shows that marginal datasets contain extra information that can be used to reduce the number of possible models, in some cases yielding a unique model.

Paper: It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution

This paper treats gender bias latent in word embeddings. Previous mitigation attempts rely on the operationalisation of gender bias as a projection over a linear subspace. An alternative approach is Counterfactual Data Augmentation (CDA), in which a corpus is duplicated and augmented to remove bias, e.g. by swapping all inherently-gendered words in the copy. We perform an empirical comparison of these approaches on the English Gigaword and Wikipedia, and find that whilst both successfully reduce direct bias and perform well in tasks which quantify embedding quality, CDA variants outperform projection-based methods at the task of drawing non-biased gender analogies by an average of 19% across both corpora. We propose two improvements to CDA: Counterfactual Data Substitution (CDS), a variant of CDA in which potentially biased text is randomly substituted to avoid duplication, and the Names Intervention, a novel name-pairing technique that vastly increases the number of words being treated. CDA/S with the Names Intervention is the only approach which is able to mitigate indirect gender bias: following debiasing, previously biased words are significantly less clustered according to gender (cluster purity is reduced by 49%), thus improving on the state-of-the-art for bias mitigation.

Paper: Counterfactual Depth from a Single RGB Image

We describe a method that predicts, from a single RGB image, a depth map that describes the scene when a masked object is removed – we call this ‘counterfactual depth’ that models hidden scene geometry together with the observations. Our method works for the same reason that scene completion works: the spatial structure of objects is simple. But we offer a much higher resolution representation of space than current scene completion methods, as we operate at pixel-level precision and do not rely on a voxel representation. Furthermore, we do not require RGBD inputs. Our method uses a standard encoder-decoder architecture, and with a decoder modified to accept an object mask. We describe a small evaluation dataset that we have collected, which allows inference about what factors affect reconstruction most strongly. Using this dataset, we show that our depth predictions for masked objects are better than other baselines.

Paper: A Turvey-Shapley Value Method for Distribution Network Cost Allocation

This paper proposes a novel cost-reflective and computationally efficient method for allocating distribution network costs to residential customers. First, the method estimates the growth in peak demand with a 50% probability of exceedance (50POE) and the associated network augmentation costs using a probabilistic long-run marginal cost computation based on the Turvey perturbation method. Second, it allocates these costs to customers on a cost-causal basis using the Shapley value solution concept. To overcome the intractability of the exact Shapley value computation for real-world applications, we implement a fast, scalable and efficient clustering technique based on customers’ peak demand contribution, which drastically reduces the Shapley value computation time. Using customer load traces from an Australian smart grid trial (Solar Home Electricity Data), we demonstrate the efficacy of our method by comparing it with established energy- and peak demand-based cost allocation approaches.

Paper: Multilevel latent class (MLC) modelling of healthcare provider causal effects on patient outcomes: Evaluation via simulation

Where performance comparison of healthcare providers is of interest, characteristics of both patients and the health condition of interest must be balanced across providers for a fair comparison. This is unlikely to be feasible within observational data, as patient population characteristics may vary geographically and patient care may vary by characteristics of the health condition. We simulated data for patients and providers, based on a previously utilized real-world dataset, and separately considered both binary and continuous covariate-effects at the upper level. Multilevel latent class (MLC) modelling is proposed to partition a prediction focus at the patient level (accommodating casemix) and a causal inference focus at the provider level. The MLC model recovered a range of simulated Trust-level effects. Median recovered values were almost identical to simulated values for the binary Trust-level covariate, and we observed successful recovery of the continuous Trust-level covariate with at least 3 latent Trust classes. Credible intervals widen as the error variance increases. The MLC approach successfully partitioned modelling for prediction and for causal inference, addressing the potential conflict between these two distinct analytical strategies. This improves upon strategies which only adjust for differential selection. Patient-level variation and measurement uncertainty are accommodated within the latent classes.

Paper: Optimal Causal Rate-Constrained Sampling of the Wiener Process

We consider the following communication scenario. An encoder causally observes the Wiener process and decides when and what to transmit about it. A decoder makes real-time estimation of the process using causally received codewords. We determine the causal encoding and decoding policies that jointly minimize the mean-square estimation error, under the long-term communication rate constraint of $R$ bits per second. We show that an optimal encoding policy can be implemented as a causal sampling policy followed by a causal compressing policy. We prove that the optimal encoding policy samples the Wiener process once the innovation passes either $\sqrt{\frac{1}{R}}$ or $-\sqrt{\frac{1}{R}}$, and compresses the sign of the innovation (SOI) using a 1-bit codeword. The SOI coding scheme achieves the operational distortion-rate function, which is equal to $D^{\mathrm{op}}(R)=\frac{1}{6R}$. Surprisingly, this is significantly better than the distortion-rate tradeoff achieved in the limit of infinite delay by the best non-causal code. This is because the SOI coding scheme leverages the free timing information supplied by the zero-delay channel between the encoder and the decoder. The key to unlock that gain is the event-triggered nature of the SOI sampling policy. In contrast, the distortion-rate tradeoffs achieved with deterministic sampling policies are much worse: we prove that the causal informational distortion-rate function in that scenario is as high as $D_{\mathrm{DET}}(R) = \frac{5}{6R}$. It is achieved by the uniform sampling policy with the sampling interval $\frac{1}{R}$. In either case, the optimal strategy is to sample the process as fast as possible and to transmit 1-bit codewords to the decoder without delay.

Paper: Learning Physics from Data: a Thermodynamic Interpretation

Experimental data bases are typically very large and high dimensional. To learn from them requires to recognize important features (a pattern), often present at scales different to that of the recorded data. Following the experience collected in statistical mechanics and thermodynamics, the process of recognizing the pattern (the learning process) can be seen as a dissipative time evolution driven by entropy. This is the way thermodynamics enters machine learning. Learning to handle free surface liquids serves as an illustration.

Python Library: causal-tree-learn

Python implementation of causal trees with validation

Continue Reading…


Read More

Develop Performance Benchmark with GRNN

[This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It has been mentioned in that GRNN is an ideal approach employed to develop performance benchmarks for a variety of risk models. People might wonder what the purpose of performance benchmarks is and why we would even need one at all. Sometimes, a model developer had to answer questions about how well the model would perform even before completing the model. Likewise, a model validator also wondered whether the model being validated has a reasonable performance given the data used and the effort spent. As a result, the performance benchmark, which could be built with the same data sample but an alternative methodology, is called for to address aforementioned questions.

While the performance benchmark can take various forms, including but not limited to business expectations, industry practices, or vendor products, a model-based approach should possess following characteristics:

– Quick prototype with reasonable efforts
– Comparable baseline with acceptable outcomes
– Flexible framework without strict assumptions
– Practical application to broad domains

With both empirical and conceptual advantages, GRNN is able to accommodate each of above-mentioned requirements and thus can be considered an appropriate candidate that might potentially be employed to develop performance benchmarks for a wide variety of models.

Below is an example illustrating how to use GRNN to develop a benchmark model for the logistic regression shown in The function grnn.margin() was also employed to explore the marginal effect of each attribute in a GRNN.

To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Document worth reading: “The History of Digital Spam”

Spam!: that’s what Lorrie Faith Cranor and Brian LaMacchia exclaimed in the title of a popular call-to-action article that appeared twenty years ago on Communications of the ACM. And yet, despite the tremendous efforts of the research community over the last two decades to mitigate this problem, the sense of urgency remains unchanged, as emerging technologies have brought new dangerous forms of digital spam under the spotlight. Furthermore, when spam is carried out with the intent to deceive or influence at scale, it can alter the very fabric of society and our behavior. In this article, I will briefly review the history of digital spam: starting from its quintessential incarnation, spam emails, to modern-days forms of spam affecting the Web and social media, the survey will close by depicting future risks associated with spam and abuse of new technologies, including Artificial Intelligence (e.g., Digital Humans). After providing a taxonomy of spam, and its most popular applications emerged throughout the last two decades, I will review technological and regulatory approaches proposed in the literature, and suggest some possible solutions to tackle this ubiquitous digital epidemic moving forward. The History of Digital Spam

Continue Reading…


Read More


[This article was first published on Random R Ramblings, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

To leave a comment for the author, please follow the link and comment on their blog: Random R Ramblings. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Distilled News

Machine learning requires a fundamentally different deployment approach

The biggest issue facing machine learning (ML) isn’t whether we will discover better algorithms (we probably will), whether we’ll create a general AI (we probably won’t), or whether we’ll be able to deal with a flood of smart fakes (that’s a long-term, escalating battle). The biggest issue is how we’ll put ML systems into production. Getting an experiment to work on a laptop, even an experiment that runs ‘in the cloud,’ is one thing. Putting that experiment into production is another matter. Production has to deal with reality, and reality doesn’t live on our laptops. Most of our understanding of ‘production’ has come from the web world and learning how to run ecommerce and social media applications at scale. The latest advances in web operations – containerization and container orchestration – make it easier to package applications that can be deployed reliably and maintained consistently. It’s still not easy, but the tools are there. That’s a good start. ML applications differ from traditional software in two important ways. First, they’re not deterministic. Second, the application’s behavior isn’t determined by the code, but by the data used for training. These two differences are closely related.

Visualizing Association Rules for Text Mining

Association is a powerful data analysis technique that appears frequently in data mining literature. An association rule is an implication of the form X?Y where X is a set of antecedent items and Y is the consequent item. An example association rule of a supermarket database is 80% of the people who buy diapers and baby powder also buy baby oil. The analysis of association rules is used in a variety of ways, including merchandise stocking, insurance fraud investigation, and climate prediction. For years scientists and engineers have developed many visualization techniques to support the analyses of association rules. Many of the visualizations, however, have come up short in dealing with large amounts of rules or rules with multiple antecedents. This limitation results in serious challenges for analysts who need to understand the association information of large databases.

Data Engineering – How to Set Dependencies Between Data Pipelines in Apache Airflow

Use Sensors to Set Effective Dependencies between Data Pipelines to Build a Solid Foundation for the Data Team. Greetings my fellow readers, it’s me again. That guy who writes about his life experiences and a little tad bit about data. Just a little bit. Article after article, I always start with how important is data in a strong organisation. How large companies are using data to impact their business, impact our society and in turn, making them profits. Data can be used to save lives, just take a look at this article on how data is used to predict cancer cells. So please, let’s skip the small talk about how important is data and how it’s always right, just like your wife.

Torchvision & Transfer Learning

This article will probably be of most interest to individuals just starting out in deep learning or people who are relatively new to leveraging PyTorch. It is a summary of my experience with recent attempts to modify the torchvision package’s CNNs that have been pre-trained on data from Imagenet. The aim of making a multiple architecture classifier a little easier to program.

Productionizing NLP Models

Lately, I have been consolidating my experiences of working in different ML projects. I will tell this story from the lens of my recent project. Our task was to classify certain phrases into categories – A multiclass single label problem.

Unsupervised Learning to Market Behavior Forecasting

This article describes the technique that forecasts the market behavior. The second part demonstrates the application of the approach in a trading strategy. The market data is a sequence called time series. Usually, researchers use only price data (or asset returns) to create a model that forecasts the next price value, movement direction, or other output. I think the better way is to use more data for that. The idea is try to combine versatile market conditions (volatility, volumes, price changes, and etc.)

The proper way to use Machine Learning metrics

It’s hard to select the right measure of accuracy for a given problem. Having a standardised approach is what every data scientist should do. Plan of this article
• Motivation
• First consideration
• Must-know measures of accuracy for ML models
• An approach to use these measures to select the right ML model for your problem
Note I focus on binary classification problems in this article, but the approach would be similar with multi classification and regression problems.

Why do we use word embeddings in NLP?

Natural language processing (NLP) is a sub-field of machine learning (ML) that deals with natural language, often in the form of text, which is itself composed of smaller units like words and characters. Dealing with text data is problematic, since our computers, scripts and machine learning models can’t read and understand text in any human sense. When I read the word ‘cat’, many different associations are invoked – it’s a small furry animal that’s cute, eats fish, my landlord doesn’t allow, etc. But these linguistic associations are a result of quite complex neurological computations honed over millions of years of evolution, whereas our ML models must start from scratch with no pre-built understanding of word meaning.

Risks and Caution on applying PCA for Supervised Learning Problems

The curse of dimensionality is a very crucial problem while dealing with real-life datasets which are generally higher dimensional data .As the dimensionality of the feature space increases, the number of configurations can grow exponentially and thus the number of configurations covered by an observation decreases. In such a scenario, Principal Component Analysis plays a major part in efficiently reducing the dimensionality of the data yet retaining as much as possible of the variation present in the data set. Let us give a very brief introduction to Principal component analysis before delving into the actual problem.

Recursive Sketches for Modular Deep Learning

Much of classical machine learning (ML) focuses on utilizing available data to make more accurate predictions. More recently, researchers have considered other important objectives, such as how to design algorithms to be small, efficient, and robust. With these goals in mind, a natural research objective is the design of a system on top of neural networks that efficiently stores information encoded within – in other words, a mechanism to compute a succinct summary (a ‘sketch’) of how a complex deep network processes its inputs. Sketching is a rich field of study that dates back to the foundational work of Alon, Matias, and Szegedy, which can enable neural networks to efficiently summarize information about their inputs. For example: Imagine stepping into a room and briefly viewing the objects within. Modern machine learning is excellent at answering immediate questions, known at training time, about this scene: ‘Is there a cat? How big is said cat?’ Now, suppose we view this room every day over the course of a year. People can reminisce about the times they saw the room: ‘How often did the room contain a cat? Was it usually morning or night when we saw the room?’. However, can one design systems that are also capable of efficiently answering such memory-based questions even if they are unknown at training time? In ‘Recursive Sketches for Modular Deep Learning’, recently presented at ICML 2019, we explore how to succinctly summarize how a machine learning model understands its input. We do this by augmenting an existing (already trained) machine learning model with ‘sketches’ of its computation, using them to efficiently answer memory-based questions – for example, image-to-image-similarity and summary statistics – despite the fact that they take up much less memory than storing the entire original computation.

Scikit-Learn vs mlr for Machine Learning

Scikit-Learn is known for its easily understandable API for Python users, and MLR became an alternative to the popular Caret package with a larger suite of available algorithms and an easy way of tuning hyperparameters. These two packages are somewhat in competition due to the debate where many people involved in analytics turn to Python for machine learning and R for statistical analysis. One of the reasons for a preference in using Python could be that current R packages for machine learning are provided via other packages that contain the algorithm. The packages are called through MLR but still require extra installation. Even external feature selection libraries are needed and they will have other external dependencies that need to be satisfied as well. Scikit-Learn is dubbed as a unified API to a number of machine learning algorithms that do not require the user to call anymore libraries. This by no means discredits R. R is still a major component in the data science world regardless of what an online poll would say. Anyone with a background in Statistics and or Mathematics will know why you should use R (regardless of whether they use it themselves they recognize the appeal). Now we will take a look at how a user would go through a typical machine learning workflow. We will proceed with Logistic Regression in Scikit-Learn, and Decision Tree in MLR.

A Primer to Recommendation Engines

Recommendation engines are everywhere now. Almost any app you use incorporates some sort of recommendation system to either push new content or drive sales. Recommendation engines are Netflix telling you what you should watch next, the ads on Facebook pushing products that you just happened to look at once, or even Slack suggesting which organization channels you should join. The advent of big data and machine learning has made recommendation engines one of the most directly applicable aspects of Data Science.

A Robustly Optimized BERT Pretraining Approach

BERT (Devlin et al., 2018) is a method of pre-training language representations, meaning that we train a general-purpose ‘language understanding’ model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised, deeply bidirectional system for pre-training NLP.

Continue Reading…


Read More

September 14, 2019

If you did not already know

I-Optimality google
The generalized linear model plays an important role in statistical analysis and the related design issues are undoubtedly challenging. The state-of-the-art works mostly apply to design criteria on the estimates of regression coefficients. It is of importance to study optimal designs for generalized linear models, especially on the prediction aspects. In this work, we propose a prediction-oriented design criterion, I-optimality, and develop an efficient sequential algorithm of constructing I-optimal designs for generalized linear models. Through establishing the General Equivalence Theorem of the I-optimality for generalized linear models, we obtain an insightful understanding for the proposed algorithm on how to sequentially choose the support points and update the weights of support points of the design. The proposed algorithm is computationally efficient with guaranteed convergence property. Numerical examples are conducted to evaluate the feasibility and computational efficiency of the proposed algorithm. …

Regularized Determinantal Point Process (R-DPP) google
Given a fixed $n\times d$ matrix $\mathbf{X}$, where $n\gg d$, we study the complexity of sampling from a distribution over all subsets of rows where the probability of a subset is proportional to the squared volume of the parallelopiped spanned by the rows (a.k.a. a determinantal point process). In this task, it is important to minimize the preprocessing cost of the procedure (performed once) as well as the sampling cost (performed repeatedly). To that end, we propose a new determinantal point process algorithm which has the following two properties, both of which are novel: (1) a preprocessing step which runs in time $O(\text{number-of-non-zeros}(\mathbf{X})\cdot\log n)+\text{poly}(d)$, and (2) a sampling step which runs in $\text{poly}(d)$ time, independent of the number of rows $n$. We achieve this by introducing a new regularized determinantal point process (R-DPP), which serves as an intermediate distribution in the sampling procedure by reducing the number of rows from $n$ to $\text{poly}(d)$. Crucially, this intermediate distribution does not distort the probabilities of the target sample. Our key novelty in defining the R-DPP is the use of a Poisson random variable for controlling the probabilities of different subset sizes, leading to new determinantal formulas such as the normalization constant for this distribution. Our algorithm has applications in many diverse areas where determinantal point processes have been used, such as machine learning, stochastic optimization, data summarization and low-rank matrix reconstruction. …

Feature-Label Memory Network google
Deep learning typically requires training a very capable architecture using large datasets. However, many important learning problems demand an ability to draw valid inferences from small size datasets, and such problems pose a particular challenge for deep learning. In this regard, various researches on ‘meta-learning’ are being actively conducted. Recent work has suggested a Memory Augmented Neural Network (MANN) for meta-learning. MANN is an implementation of a Neural Turing Machine (NTM) with the ability to rapidly assimilate new data in its memory, and use this data to make accurate predictions. In models such as MANN, the input data samples and their appropriate labels from previous step are bound together in the same memory locations. This often leads to memory interference when performing a task as these models have to retrieve a feature of an input from a certain memory location and read only the label information bound to that location. In this paper, we tried to address this issue by presenting a more robust MANN. We revisited the idea of meta-learning and proposed a new memory augmented neural network by explicitly splitting the external memory into feature and label memories. The feature memory is used to store the features of input data samples and the label memory stores their labels. Hence, when predicting the label of a given input, our model uses its feature memory unit as a reference to extract the stored feature of the input, and based on that feature, it retrieves the label information of the input from the label memory unit. In order for the network to function in this framework, a new memory-writingmodule to encode label information into the label memory in accordance with the meta-learning task structure is designed. Here, we demonstrate that our model outperforms MANN by a large margin in supervised one-shot classification tasks using Omniglot and MNIST datasets. …

Decision Tree Based Missing Value Imputation Technique (DMI) google
Decision tree based Missing value Imputation technique’ (DMI) makes use of an EM algorithm and a decision tree (DT) algorithm. …

Continue Reading…


Read More

Facebook Research at Interspeech 2019

The post Facebook Research at Interspeech 2019 appeared first on Facebook Research.

Continue Reading…


Read More

Cartoon: Unsupervised Machine Learning?

New KDnuggets Cartoon looks at one of the hottest directions in Machine Learning and asks "Can Machine Learning be too unsupervised?"

Continue Reading…


Read More

Science and Technology links (September 14th 2019)

  1. Streaming music makes up 80% of the revenue of the music industry. Revenue is up 18% for the first six months of 2019. This follows a record year in 2018 when the music industry reached its highest revenues in 10 years. Though it should be good news for musicians, it is also the case that record labels often take the lion share of the revenues.
  2. We have seen massive progress in the last five years in artificial intelligence. Yet we do not see obvious signs of economic gains from this progress.
  3. A widely cited study contradicting the existence of free will was fatally flawed.
  4. A common diabetes drug might spur neurogenesis and brain repair in the brain of (female) mice.
  5. Facebook is working on creating real-time models of your face for use in virtual reality.
  6. A simple and inexpensive eye test might be sufficient to detect early signs of Alzheimer’s.

Continue Reading…


Read More

Speeding up independent binary searches by interleaving them

Given a long list of sorted values, how do you find the location of a particular value? A simple strategy is to first look at the middle of the list. If your value is larger than the middle value, look at the last half of the list, if not look at the first half of the list. Then repeat with select half, looking again in the middle. This algorithm is called a binary search. One of the first algorithms that computer science students learn is the binary search. I suspect that many people figure it out as a kid.

It is hard to drastically improve on the binary search if you only need to do one.

But what if you need to execute multiple binary searches, over distinct lists? Maybe surprisingly, in such cases, you can multiply the speed.

Let us first reason about how a binary search works. The processor needs to retrieve the middle value and compare it with your target value. Before the middle value is loaded, the processor cannot tell whether it will need to access the last or first half of the list. It might speculate. Most processors have branch predictors. In the case of a binary search, the branch is hard to predict so we might expect that the branch predictor will get it wrong half the time. Still: when it speculates properly, it is a net win. Importantly, we are using the fact that processors can do multiple memory requests at once.

What else could the processor do? If there are many binary searches to do, it might initiate the second one. And then maybe initiate the third one and so forth. This might be a lot more beneficial than speculating wastefully on a single binary search.

How can it start the second or third search before it finishes the current one? Again, due to speculation. If it is far along in the first search to predict its end, it might see the next search coming and start it even though it is still waiting for data regarding the current binary search. This should be especially easy if your sorted arrays have a predictable size.

There is a limit to how many instructions your processor might reorder in this manner. Let us say, for example, that the limit is 200 instructions. If each binary search takes 100 instructions, and that this value is reliable (maybe because the arrays have fixed sizes), then your processor might be able do up to two binary searches at once. So it might go twice as fast. But it probably cannot easily go much further (to 3 or 4).

But the programmer can help. We can manually tell the processor initiate right away four binary searches:

given arrays a1, a2, a3, a4
given target t1, t2, t3, t4

compare t1 with middle of a1
compare t2 with middle of a2
compare t3 with middle of a3
compare t4 with middle of a4

cut a1 in two based on above comparison
cut a2 in two based on above comparison
cut a3 in two based on above comparison
cut a4 in two based on above comparison

compare t1 with middle of a1
compare t2 with middle of a2
compare t3 with middle of a3
compare t4 with middle of a4


You can go far higher than 4 interleaved searches. You can do 8, 16, 32… I suspect that there is no practical need to go beyond 32.

How well does it work? Let us take 1024 sorted arrays containing a total of 64 million integers. In each array, I want to do one and just one binary search. Between each test, I access all of data in all of the arrays a couple of times to fool the cache.

By default, if you code a binary search, the resulting assembly will be made of comparisons and jumps. Hence your processor will execute this code in a speculative manner. At least with GNU GCC, we can write the C/C++ code in such a way that the branches are implemented as “conditional move” instructions which prevents the processor from speculating.

My results on a recent Intel processor (Cannon Lake) with GNU GCC 8 are as follows:

algorithm time per search relative speed
1-wise (independent) 2100 cycles/search 1.0 ×
1-wise (speculative) 900 cycles/search 2.3 ×
1-wise (branchless) 1100 cycles/search 2.0 ×
2-wise (branchless) 800 cycles/search 2.5 ×
4-wise (branchless) 600 cycles/search 3.5 ×
8-wise (branchless) 350 cycles/search 6.0 ×
16-wise (branchless) 280 cycles/search 7.5 ×
32-wise (branchless) 230 cycles/search 9 ×

So we are able to go 3 to 4 times faster, on what is effectively a memory bound problem, by interleaving 32 binary searches together. Interleaving merely 16 searches might also be preferable on some systems.

But why is this not higher than 4 times faster? Surely the processor can issue more than 4 memory loads at once?

That is because the 1-wise search, even without speculation, already benefits from the fact that we are streaming multiple binary searches, so that more than one is ongoing at any one time. Indeed, I can prevent the processor from usefully executing more than one search either by inserting memory fences between each search, or by making the target of one search dependent on the index found in the previous search. When I do so the time goes up to 2100 cycles/array which is approximately 9 times longer than 32-wise approach. The exact ratio (9) varies depending on the exact processor: I get a factor of 7 on the older Skylake architecture and a factor of 5 on an ARM Skylark processor.

My source code is available.

Implementation-wise, I code my algorithms in pure C/C++. There is no need for fancy, esoteric instructions. The condition move instructions are pretty much standard and old at this point. Sadly, I only know how to convince one compiler (GNU GCC) to reliably produce conditional move instructions. And I have no clue how to control how Java, Swift, Rust or any other language deals with branches.

Could you do better than my implementation? Maybe but there are arguments suggesting that you can’t beat it by much in general. Each data access is done using fewer than 10 instructions in my implementation, which is far below the number of cycles and small compared to the size of the instruction buffers, so finding ways to reduce the instruction count should not help. Furthermore, it looks like I am already nearly maxing out the amount of memory-level parallelism at least on some hardware (9 on Cannon Lake, 7 on Skylake, 5 on Skylark). On Cannon Lake, however, you should be able to do better.

Credit: This work is the result of a collaboration with Travis Downs and Nathan Kurz, though all of the mistakes are mine.

Continue Reading…


Read More

Twitter “Account Analysis” in R

[This article was first published on R –, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This past week @propublica linked to a really spiffy resource for getting an overview of a Twitter user’s profile and activity called accountanalysis. It has a beautiful interface that works as well on mobile as it does in a real browser. It also is fully interactive and supports cross-filtering (zoom in on the timeline and the other graphs change). It’s especially great if you’re not a coder, but if you are, @kearneymw’s {rtweet} can get you all this info and more, putting the power of R behind data frames full of tweet inanity.

While we covered quite a bit of {rtweet} ground in the 21 Recipes book, summarizing an account to the degree that accountanalysis does is not in there. To rectify this oversight, I threw together a static clone of accountanalysis that can make standalone HTML reports like this one.

It’s a fully parameterized R markdown document, meaning you can run it as just a function call (or change the parameter and knit it by hand):

  input = "account-analysis.Rmd", 
  params = list(
    username = "propublica"
  output_file = "~/Documents/propublica-analysis.html"

It will also, by default, save a date-stamped copy of the user info and retrieved timeline into the directory you generate the report from (add a prefix path to the save portion in the Rmd to store it in a better place).

With all the data available, you can dig in and extract all the information you want/need.


You can get the Rmd at your favorite social coding service:

To leave a comment for the author, please follow the link and comment on their blog: R – offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Whats new on arXiv – Complete List

Do Cross Modal Systems Leverage Semantic Relationships?
On perfectness in Gaussian graphical models
Meta Relational Learning for Few-Shot Link Prediction in Knowledge Graphs
Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis
Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set
Deep Morphological Neural Networks
What can the brain teach us about building artificial intelligence?
Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment
Answers Unite! Unsupervised Metrics for Reinforced Summarization Models
Metric Learning from Imbalanced Data
Empirical Analysis of Knowledge Distillation Technique for Optimization of Quantized Deep Neural Networks
Subset Multivariate Collective And Point Anomaly Detection
Deep Convolutional Networks in System Identification
Augmented Memory Networks for Streaming-Based Active One-Shot Learning
Mogrifier LSTM
Matching Component Analysis for Transfer Learning
Quasi-Newton Optimization Methods For Deep Learning Applications
Differentially Private SQL with Bounded User Contribution
Modelling transport provision in a polycentric mega city region
Artificial Neural Networks and Adaptive Neuro-fuzzy Models for Prediction of Remaining Useful Life
ApproxNet: Content and Contention Aware Video Analytics System for the Edge
Towards a reliable approach on scaling in data acquisition
Reduced-order modeling for nonlinear Bayesian statistical inverse problems
Evaluating Conformance Measures in Process Mining using Conformance Propositions (Extended version)
High-Fidelity State-of-Charge Estimation of Li-Ion Batteries Using Machine Learning
Online Dissolved Gas Analysis (DGA) Monitoring System
On Distribution Patterns of Power Flow Solutions
Data-driven simulation for general purpose multibody dynamics using deep neural networks
High-order partitioned spectral deferred correction solvers for multiphysics problems
Riemannian batch normalization for SPD neural networks
A Communication-Efficient Algorithm for Exponentially Fast Non-Bayesian Learning in Networks
Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control
Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense
Generalized Neyman-Pearson lemma for convex expectations on $L^{\infty}(μ)$-spaces
Parameter Estimation with the Ordered $\ell_{2}$ Regularization via an Alternating Direction Method of Multipliers
Fundamental Tradeoffs in Uplink Grant-Free Multiple Access with Protected CSI
Accurate Esophageal Gross Tumor Volume Segmentation in PET/CT using Two-Stream Chained 3D Deep Network Fusion
Likelihood-Free Overcomplete ICA and Applications in Causal Discovery
Deep Esophageal Clinical Target Volume Delineation using Encoded 3D Spatial Context of Tumors, Lymph Nodes, and Organs At Risk
Referring Expression Generation Using Entity Profiles
Spectral Norm and Nuclear Norm of a Third Order Tensor
What Happens on the Edge, Stays on the Edge: Toward Compressive Deep Learning
Network Transfer Learning via Adversarial Domain Adaptation with Graph Convolution
Snowball: Iterative Model Evolution and Confident Sample Discovery for Semi-Supervised Learning on Very Small Labeled Datasets
Towards Automatic Detection of Misinformation in Online Medical Videos
Q-DATA: Enhanced Traffic Flow Monitoring in Software-Defined Networks applying Q-learning
Using Weaker Consistency Models with Monitoring and Recovery for Improving Performance of Key-Value Stores
Aerial multi-object tracking by detection using deep association networks
Counting acyclic and strong digraphs by descents
Engineering Boolean Matrix Multiplication for Multiple-Accelerator Shared-Memory Architectures
Rigidity and non-rigidity for uniform perturbed lattice
Simpler and Faster Learning of Adaptive Policies for Simultaneous Translation
Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons
Classes of graphs with low complexity: the case of classes with bounded linear rankwidth
A Non-commutative Bilinear Model for Answering Path Queries in Knowledge Graphs
AMR Normalization for Fairer Evaluation
Simulated Annealing In $\mathbf{R}^d$ With Slowly Growing Potentials
Reservoir Computing based on Quenched Chaos
Learning sparse representations in reinforcement learning
Empirical Hypothesis Space Reduction
Rate of convergence of uniform transport processes to Brownian sheet
Stability phenomena for Martin boundaries of relatively hyperbolic groups
High-order gas-kinetic scheme with three-dimensional WENO reconstruction for the Euler and Navier-Stokes solutions
Gerrymandering: A Briber’s Perspective
Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity
Functional Asplund’s metrics for pattern matching robust to variable lighting conditions
Topological Coding and Topological Matrices Toward Network Overall Security
HinDom: A Robust Malicious Domain Detection System based on Heterogeneous Information Network with Transductive Classification
Bidirectional One-Shot Unsupervised Domain Mapping
Shortest Paths in a Hybrid Network Model
A Cost-Scaling Algorithm for Minimum-Cost Node-Capacitated Multiflow Problem
A Compositional Model of Multi-faceted Trust for Personalized Item Recommendation
Online Optimization of Wireless Powered Mobile-Edge Computing for Heterogeneous Industrial Internet of Things
SQuAP-Ont: an Ontology of Software Quality Relational Factors from Financial Systems
Multi-DoF Time Domain Passivity Approach based Drift Compensation for Telemanipulation
Multifractal Description of Streamflow and Suspended Sediment Concentration Data from Indian River Basins
Latent Gaussian process with composite likelihoods for data-driven disease stratification
SSAP: Single-Shot Instance Segmentation With Affinity Pyramid
Exponential and Laplace approximation for occupation statistics of branching random walk
Scaling agile on large enterprise level — systematic bundling and application of state of the art approaches for lasting agile transitions
On the k-synchronizability for mailbox systems
The polarization process of ferroelectric materials analyzed in the framework of variational inequalities
Learning Sensor Placement from Demonstration for UAV networks
Competing risks joint models using R-INLA
Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?
Defeating Opaque Predicates Statically through Machine Learning and Binary Analysis
ParaQG: A System for Generating Questions and Answers from Paragraphs
PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud
Reachable states and holomorphic function spaces for the 1-D heat equation
Heterogeneous Proxytypes Extended: Integrating Theory-like Representations and Mechanisms with Prototypes and Exemplars
LeDeepChef: Deep Reinforcement Learning Agent for Families of Text-Based Games
3D landmark detection for augmented reality based otologic procedures
Regression-based sparse polynomial chaos for uncertainty quantification of subsurface flow models
Control issues and linear projection constraints on the control and on the controlled trajectory
Zeta functions of graphs, their symmetries and extended Catalan numbers
Minimising the Levelised Cost of Electricity for Bifacial Solar Panel Arrays using Bayesian Optimisation
Stochastic perturbations and fisheries management
Complexity of controlled bad sequences over finite powersets of $\mathbb{N}^k$
Hierarchical Model Reduction Techniques for Flow Modeling in a Parametrized Setting
Distance transform regression for spatially-aware deep semantic segmentation
Testing nonparametric shape restrictions
Parameterized Intractability of Even Set and Shortest Vector Problem
Modelling the Behavior Classification of Social News Aggregations Users
Social Networks as a Tool for a Higher Education Institution Image Creation
Predicting Software Tests Traces
Under the Conditions of Non-Agenda Ownership: Social Media Users in the 2019 Ukrainian Presidential Elections Campaign
Deep Learning-Aided Tabu Search Detection for Large MIMO Systems
A Specialized Evolutionary Strategy Using Mean Absolute Error Random Sampling to Design Recurrent Neural Networks
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
SAO WMT19 Test Suite: Machine Translation of Audit Reports
A note on the optimal rubbling in ladders and prisms
Simulation and computational analysis of multiscale graph agent-based tumor model
A scalable algorithm for identifying multiple sensor faults using disentangled RNNs
ScisummNet: A Large Annotated Dataset and Content-Impact Models for Scientific Paper Summarization with Citation Networks
Different Absorption from the Same Sharing: Sifted Multi-task Learning for Fake News Detection
On a Conjecture of Lovász on Circle-Representations of Simple 4-Regular Planar Graphs
Chase-escape with death on trees
Transforming Gaussian correlations. Applications to generating long-range power-law correlated time series with arbitrary distribution
Joint Radar-Communications Strategies for Autonomous Vehicles
Outage Analysis of Cooperative NOMA Using Maximum Ratio Combining at Intersections
Tensor decompositions on simplicial complexes with invariance
Minimax Isometry Method
Büchi automata for distributed temporal logic
Proof-Based Synthesis of Sorting Algorithms Using Multisets in Theorema
An Efficient and Layout-Independent Automatic License Plate Recognition System Based on the YOLO detector
Optimal uniform continuity bound for conditional entropy of classical–quantum states
Rate-Memory Trade-off for Multi-access Coded Caching with Uncoded Placement
Value Iteration Algorithm for Mean-field Games
Extending the Scope of Robust Quadratic Optimization
Complexity of Computing the Shapley Value in Games with Externalities
Spiking Neural Networks for Inference and Learning: A Memristor-based Design Perspective
Affect Enriched Word Embeddings for News Information Retrieval
Sequential Convex Restriction and its Applications in Robust Optimization
Inference in Differences-in-Differences: How Much Should We Trust in Independent Clusters?
GPU-based parallelism for ASP-solving
Fractals2019: Combinatorial Optimisation with Dynamic Constraint Annealing
Extracting Aspects Hierarchies using Rhetorical Structure Theory
Epistemological Issues in Educational Data Mining
ICDM 2019 Knowledge Graph Contest: Team UWA
ALIME: Autoencoder Based Approach for Local Interpretability
Learning Distributions Generated by One-Layer ReLU Networks
Linear robust adaptive model predictive control: Computational complexity and conservatism
PISEP^2: Pseudo Image Sequence Evolution based 3D Pose Prediction
Status in flux: Unequal alliances can create power vacuums
New insights for setting up contractual options for demand side flexibility
Using Mw dependence of surface dynamics of glassy polymers to probe the length scale of free surface mobility
Multifidelity Computer Model Emulation with High-Dimensional Output
‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking
Are Adversarial Robustness and Common Perturbation Robustness Independant Attributes ?
Mapping Spiking Neural Networks to Neuromorphic Hardware
Deep kernel learning for integral measurements
Semiparametric Inference for Non-monotone Missing-Not-at-Random Data: the No Self-Censoring Model
An equilibrated a posteriori error estimator for arbitrary-order Nédélec elements for magnetostatic problems
Dynamic Boundary Guarding Against Radially Incoming Targets
Rethinking the Number of Channels for the Convolutional Neural Network
Empirical Study of Diachronic Word Embeddings for Scarce Data
Optimal Wireless Resource Allocation with Random Edge Graph Neural Networks
Deep learning networks for selection of persistent scatterer pixels in multi-temporal SAR interferometric processing
Generalized Integrated Gradients: A practical method for explaining diverse ensembles
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning
Projectively self-concordant barriers on convex sets
Does the ratio of Laplace transforms of powers of a function identify the function?
Agent-based model for tumour-analysis using Python+Mesa
Optimal translational-rotational invariant dictionaries for images
Let’s agree to disagree: learning highly debatable multirater labelling
About Fibonacci trees III: multiple Fibonacci trees
A Microscopic Theory of Intrinsic Timescales in Spiking Neural Networks
ICSrange: A Simulation-based Cyber Range Platform for Industrial Control Systems
Cultural diversity and the measurement of functional impairment: A cross-cultural validation of the Amsterdam IADL Questionnaire
On Orthogonal Vector Edge Coloring
Mape_Maker: A Scenario Creator
Ramsey numbers of path-matchings, covering designs and 1-cores
Moderate deviations for the range of a transient random walk. II
The spectral properties of Vandermonde matrices with clustered nodes
Simultaneous Estimation of Number of Clusters and Feature Sparsity in Clustering High-Dimensional Data
Efron-Stein PAC-Bayesian Inequalities
Gaps of Summands of the Zeckendorf Lattice
The Fibonacci Quilt Game
Two-Way Coding and Attack Decoupling in Control Systems Under Injection Attacks
Regularized Linear Inversion with Randomized Singular Value Decomposition
Mixture Content Selection for Diverse Sequence Generation
Tensor Analysis with n-Mode Generalized Difference Subspace
Dense Extreme Inception Network: Towards a Robust CNN Model for Edge Detection
Dispersion of Mobile Robots in the Global Communication Model
From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An Overview of the Aristo Project
A Note on Data-Driven Control for SISO Feedback Linearizable Systems Without Persistency of Excitation
Beyond Photo Realism for Domain Adaptation from Synthetic Data
A Constructive Approach for Data-Driven Randomized Learning of Feedforward Neural Networks
Self-Attentive Adversarial Stain Normalization
A greedoid and a matroid inspired by Bhargava’s $p$-orderings
The ML-EM algorithm in continuum: sparse measure solutions
ACES — Automatic Configuration of Energy Harvesting Sensors with Reinforcement Learning
Level-set percolation of the Gaussian free field on regular graphs II: Finite expanders
Level-set percolation of the Gaussian free field on regular graphs I: Regular trees
Mining for Dark Matter Substructure: Inferring subhalo population properties from strong lenses with machine learning
On Arithmetical Structures on Complete Graphs
Deep Transfer Learning for Star Cluster Classification: I. Application to the PHANGS-HST Survey
An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction
Online Regularization by Denoising with Applications to Phase Retrieval

Continue Reading…


Read More

Whats new on arXiv

Do Cross Modal Systems Leverage Semantic Relationships?

Current cross-modal retrieval systems are evaluated using R@K measure which does not leverage semantic relationships rather strictly follows the manually marked image text query pairs. Therefore, current systems do not generalize well for the unseen data in the wild. To handle this, we propose a new measure, SemanticMap, to evaluate the performance of cross-modal systems. Our proposed measure evaluates the semantic similarity between the image and text representations in the latent embedding space. We also propose a novel cross-modal retrieval system using a single stream network for bidirectional retrieval. The proposed system is based on a deep neural network trained using extended center loss, minimizing the distance of image and text descriptions in the latent space from the class centers. In our system, the text descriptions are also encoded as images which enabled us to use a single stream network for both text and images. To the best of our knowledge, our work is the first of its kind in terms of employing a single stream network for cross-modal retrieval systems. The proposed system is evaluated on two publicly available datasets including MSCOCO and Flickr30K and has shown comparable results to the current state-of-the-art methods.

On perfectness in Gaussian graphical models

Knowing when a graphical model is perfect to a distribution is essential in order to relate separation in the graph to conditional independence in the distribution, and this is particularly important when performing inference from data. When the model is perfect, there is a one-to-one correspondence between conditional independence statements in the distribution and separation statements in the graph. Previous work has shown that almost all models based on linear directed acyclic graphs as well as Gaussian chain graphs are perfect, the latter of which subsumes Gaussian graphical models (i.e., the undirected Gaussian models) as a special case. However, the complexity of chain graph models leads to a proof of this result which is indirect and mired by the complications of parameterizing this general class. In this paper, we directly approach the problem of perfectness for the Gaussian graphical models, and provide a new proof, via a more transparent parametrization, that almost all such models are perfect. Our approach is based on, and substantially extends, a construction of Ln\v{e}ni\v{c}ka and Mat\’u\v{s} showing the existence of a perfect Gaussian distribution for any graph.

Meta Relational Learning for Few-Shot Link Prediction in Knowledge Graphs

Link prediction is an important way to complete knowledge graphs (KGs), while embedding-based methods, effective for link prediction in KGs, perform poorly on relations that only have a few associative triples. In this work, we propose a Meta Relational Learning (MetaR) framework to do the common but challenging few-shot link prediction in KGs, namely predicting new triples about a relation by only observing a few associative triples. We solve few-shot link prediction by focusing on transferring relation-specific meta information to make model learn the most important knowledge and learn faster, corresponding to relation meta and gradient meta respectively in MetaR. Empirically, our model achieves state-of-the-art results on few-shot link prediction KG benchmarks.

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis

When a robot acquires new information, ideally it would immediately be capable of using that information to understand its environment. While deep neural networks are now widely used by robots for inferring semantic information, conventional neural networks suffer from catastrophic forgetting when they are incrementally updated, with new knowledge overwriting established representations. While a variety of approaches have been developed that attempt to mitigate catastrophic forgetting in the incremental batch learning scenario, in which an agent learns a large collection of labeled samples at once, streaming learning has been much less studied in the robotics and deep learning communities. In streaming learning, an agent learns instances one-by-one and can be tested at any time. Here, we revisit streaming linear discriminant analysis, which has been widely used in the data mining research community. By combining streaming linear discriminant analysis with deep learning, we are able to outperform both incremental batch learning and streaming learning algorithms on both ImageNet-1K and CORe50.

Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set

Development sets are impractical to obtain for real low-resource languages, since using all available data for training is often more effective. However, development sets are widely used in research papers that purport to deal with low-resource natural language processing (NLP). Here, we aim to answer the following questions: Does using a development set for early stopping in the low-resource setting influence results as compared to a more realistic alternative, where the number of training epochs is tuned on development languages? And does it lead to overestimation or underestimation of performance? We repeat multiple experiments from recent work on neural models for low-resource NLP and compare results for models obtained by training with and without development sets. On average over languages, absolute accuracy differs by up to 1.4%. However, for some languages and tasks, differences are as big as 18.0% accuracy. Our results highlight the importance of realistic experimental setups in the publication of low-resource NLP research results.

Deep Morphological Neural Networks

Mathematical morphology is a theory and technique to collect features like geometric and topological structures in digital images. Given a target image, determining suitable morphological operations and structuring elements is a cumbersome and time-consuming task. In this paper, a morphological neural network is proposed to address this problem. Serving as a nonlinear feature extracting layer in deep learning frameworks, the efficiency of the proposed morphological layer is confirmed analytically and empirically. With a known target, a single-filter morphological layer learns the structuring element correctly, and an adaptive layer can automatically select appropriate morphological operations. For practical applications, the proposed morphological neural networks are tested on several classification datasets related to shape or geometric image features, and the experimental results have confirmed the high computational efficiency and high accuracy.

What can the brain teach us about building artificial intelligence?

This paper is the preprint of an invited commentary on Lake et al’s Behavioral and Brain Sciences article titled ‘Building machines that learn and think like people’. Lake et al’s paper offers a timely critique on the recent accomplishments in artificial intelligence from the vantage point of human intelligence, and provides insightful suggestions about research directions for building more human-like intelligence. Since we agree with most of the points raised in that paper, we will offer a few points that are complementary.

Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment

A recent trend observed in traditionally challenging fields such as computer vision and natural language processing has been the significant performance gains shown by deep learning (DL). In many different research fields, DL models have been evolving rapidly and become ubiquitous. Despite researchers’ excitement, unfortunately, most software developers are not DL experts and oftentimes have a difficult time following the booming DL research outputs. As a result, it usually takes a significant amount of time for the latest superior DL models to prevail in industry. This issue is further exacerbated by the common use of sundry incompatible DL programming frameworks, such as Tensorflow, PyTorch, Theano, etc. To address this issue, we propose a system, called Model Asset Exchange (MAX), that avails developers of easy access to state-of-the-art DL models. Regardless of the underlying DL programming frameworks, it provides an open source Python library (called the MAX framework) that wraps DL models and unifies programming interfaces with our standardized RESTful APIs. These RESTful APIs enable developers to exploit the wrapped DL models for inference tasks without the need to fully understand different DL programming frameworks. Using MAX, we have wrapped and open-sourced more than 30 state-of-the-art DL models from various research fields, including computer vision, natural language processing and signal processing, etc. In the end, we selectively demonstrate two web applications that are built on top of MAX, as well as the process of adding a DL model to MAX.

Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

Abstractive summarization approaches based on Reinforcement Learning (RL) have recently been proposed to overcome classical likelihood maximization. RL enables to consider complex, possibly non-differentiable, metrics that globally assess the quality and relevance of the generated outputs. ROUGE, the most used summarization metric, is known to suffer from bias towards lexical similarity as well as from suboptimal accounting for fluency and readability of the generated abstracts. We thus explore and propose alternative evaluation measures: the reported human-evaluation analysis shows that the proposed metrics, based on Question Answering, favorably compares to ROUGE — with the additional property of not requiring reference summaries. Training a RL-based model on these metrics leads to improvements (both in terms of human or automated metrics) over current approaches that use ROUGE as a reward.

Metric Learning from Imbalanced Data

A key element of any machine learning algorithm is the use of a function that measures the dis/similarity between data points. Given a task, such a function can be optimized with a metric learning algorithm. Although this research field has received a lot of attention during the past decade, very few approaches have focused on learning a metric in an imbalanced scenario where the number of positive examples is much smaller than the negatives. Here, we address this challenging task by designing a new Mahalanobis metric learning algorithm (IML) which deals with class imbalance. The empirical study performed shows the efficiency of IML.

Empirical Analysis of Knowledge Distillation Technique for Optimization of Quantized Deep Neural Networks

Knowledge distillation (KD) is a very popular method for model size reduction. Recently, the technique is exploited for quantized deep neural networks (QDNNs) training as a way to restore the performance sacrificed by word-length reduction. KD, however, employs additional hyper-parameters, such as temperature, coefficient, and the size of teacher network for QDNN training. We analyze the effect of these hyper-parameters for QDNN optimization with KD. We find that these hyper-parameters are inter-related, and also introduce a simple and effective technique that reduces \textit{coefficient} during training. With KD employing the proposed hyper-parameters, we achieve the test accuracy of 92.7% and 67.0% on Resnet20 with 2-bit ternary weights for CIFAR-10 and CIFAR-100 data sets, respectively.

Subset Multivariate Collective And Point Anomaly Detection

In recent years, there has been a growing interest in identifying anomalous structure within multivariate data streams. We consider the problem of detecting collective anomalies, corresponding to intervals where one or more of the data streams behaves anomalously. We first develop a test for a single collective anomaly that has power to simultaneously detect anomalies that are either rare, that is affecting few data streams, or common. We then show how to detect multiple anomalies in a way that is computationally efficient but avoids the approximations inherent in binary segmentation-like approaches. This approach, which we call MVCAPA, is shown to consistently estimate the number and location of the collective anomalies, a property that has not previously been shown for competing methods. MVCAPA can be made robust to point anomalies and can allow for the anomalies to be imperfectly aligned. We show the practical usefulness of allowing for imperfect alignments through a resulting increase in power to detect regions of copy number variation.

Deep Convolutional Networks in System Identification

Recent developments within deep learning are relevant for nonlinear system identification problems. In this paper, we establish connections between the deep learning and the system identification communities. It has recently been shown that convolutional architectures are at least as capable as recurrent architectures when it comes to sequence modeling tasks. Inspired by these results we explore the explicit relationships between the recently proposed temporal convolutional network (TCN) and two classic system identification model structures; Volterra series and block-oriented models. We end the paper with an experimental study where we provide results on two real-world problems, the well-known Silverbox dataset and a newer dataset originating from ground vibration experiments on an F-16 fighter aircraft.

Augmented Memory Networks for Streaming-Based Active One-Shot Learning

One of the major challenges in training deep architectures for predictive tasks is the scarcity and cost of labeled training data. Active Learning (AL) is one way of addressing this challenge. In stream-based AL, observations are continuously made available to the learner that have to decide whether to request a label or to make a prediction. The goal is to reduce the request rate while at the same time maximize prediction performance. In previous research, reinforcement learning has been used for learning the AL request/prediction strategy. In our work, we propose to equip a reinforcement learning process with memory augmented neural networks, to enhance the one-shot capabilities. Moreover, we introduce Class Margin Sampling (CMS) as an extension of the standard margin sampling to the reinforcement learning setting. This strategy aims to reduce training time and improve sample efficiency in the training process. We evaluate the proposed method on a classification task using empirical accuracy of label predictions and percentage of label requests. The results indicates that the proposed method, by making use of the memory augmented networks and CMS in the training process, outperforms existing baselines.

Mogrifier LSTM

Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on language modelling in the range of 3-4 perplexity points on Penn Treebank and Wikitext-2, and 0.01-0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.

Matching Component Analysis for Transfer Learning

We introduce a new Procrustes-type method called matching component analysis to isolate components in data for transfer learning. Our theoretical results describe the sample complexity of this method, and we demonstrate through numerical experiments that our approach is indeed well suited for transfer learning.

Quasi-Newton Optimization Methods For Deep Learning Applications

Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement learning (RL), are generally restricted to the class of first-order algorithms, like stochastic gradient descent (SGD). While SGD iterates are inexpensive to compute, they have slow theoretical convergence rates. Furthermore, they require exhaustive trial-and-error to fine-tune many learning parameters. Using second-order curvature information to find search directions can help with more robust convergence for non-convex optimization problems. However, computing Hessian matrices for large-scale problems is not computationally practical. Alternatively, quasi-Newton methods construct an approximate of the Hessian matrix to build a quadratic model of the objective function. Quasi-Newton methods, like SGD, require only first-order gradient information, but they can result in superlinear convergence, which makes them attractive alternatives to SGD. The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive definite Hessian approximations. In this chapter, we propose efficient optimization methods based on L-BFGS quasi-Newton methods using line search and trust-region strategies. Our methods bridge the disparity between first- and second-order methods by using gradient information to calculate low-rank updates to Hessian approximations. We provide formal convergence analysis of these methods as well as empirical results on deep learning applications, such as image classification tasks and deep reinforcement learning on a set of ATARI 2600 video games. Our results show a robust convergence with preferred generalization characteristics as well as fast training time.

Differentially Private SQL with Bounded User Contribution

Differential privacy (DP) provides formal guarantees that the output of a database query does not reveal too much information about any individual present in the database. While many differentially private algorithms have been proposed in the scientific literature, there are only a few end-to-end implementations of differentially private query engines. Crucially, existing systems assume that each individual is associated with at most one database record, which is unrealistic in practice. We propose a generic and scalable method to perform differentially private aggregations on databases, even when individuals can each be associated with arbitrarily many rows. We express this method as an operator in relational algebra, and implement it in an SQL engine. To validate this system, we test the utility of typical queries on industry benchmarks, and verify its correctness with a stochastic test framework we developed. We highlight the promises and pitfalls learned when deploying such a system in practice, and we publish its core components as open-source software.

Continue Reading…


Read More

Document worth reading: “High-Performance Support Vector Machines and Its Applications”

The support vector machines (SVM) algorithm is a popular classification technique in data mining and machine learning. In this paper, we propose a distributed SVM algorithm and demonstrate its use in a number of applications. The algorithm is named high-performance support vector machines (HPSVM). The major contribution of HPSVM is two-fold. First, HPSVM provides a new way to distribute computations to the machines in the cloud without shuffling the data. Second, HPSVM minimizes the inter-machine communications in order to maximize the performance. We apply HPSVM to some real-world classification problems and compare it with the state-of-the-art SVM technique implemented in R on several public data sets. HPSVM achieves similar or better results. High-Performance Support Vector Machines and Its Applications

Continue Reading…


Read More

If you did not already know

NestDNN google
Mobile vision systems such as smartphones, drones, and augmented-reality headsets are revolutionizing our lives. These systems usually run multiple applications concurrently and their available resources at runtime are dynamic due to events such as starting new applications, closing existing applications, and application priority changes. In this paper, we present NestDNN, a framework that takes the dynamics of runtime resources into account to enable resource-aware multi-tenant on-device deep learning for mobile vision systems. NestDNN enables each deep learning model to offer flexible resource-accuracy trade-offs. At runtime, it dynamically selects the optimal resource-accuracy trade-off for each deep learning model to fit the model’s resource demand to the system’s available runtime resources. In doing so, NestDNN efficiently utilizes the limited resources in mobile vision systems to jointly maximize the performance of all the concurrently running applications. Our experiments show that compared to the resource-agnostic status quo approach, NestDNN achieves as much as 4.2% increase in inference accuracy, 2.0x increase in video frame processing rate and 1.7x reduction on energy consumption. …

Attention Branch Network (ABN) google
Visual explanation enables human to understand the decision making of Deep Convolutional Neural Network (CNN), but it is insufficient to contribute the performance improvement. In this paper, we focus on the attention map for visual explanation, which represents high response value as the important region in image recognition. This region significantly improves the performance of CNN by introducing an attention mechanism that focuses on a specific region in an image. In this work, we propose Attention Branch Network (ABN), which extends the top-down visual explanation model by introducing a branch structure with an attention mechanism. ABN can be applicable to several image recognition tasks by introducing a branch for attention mechanism and is trainable for the visual explanation and image recognition in end-to-end manner. We evaluate ABN on several image recognition tasks such as image classification, fine-grained recognition, and multiple facial attributes recognition. Experimental results show that ABN can outperform the accuracy of baseline models on these image recognition tasks while generating an attention map for visual explanation.
Embedding Human Knowledge in Deep Neural Network via Attention Map

Payoff Dynamical Model (PDM) google
We consider that at every instant each member of a population, which we refer to as an agent, selects one strategy out of a finite set. The agents are nondescript, and their strategy choices are described by the so-called population state vector, whose entries are the portions of the population selecting each strategy. Likewise, each entry constituting the so-called payoff vector is the reward attributed to a strategy. We consider that a general finite-dimensional nonlinear dynamical system, denoted as payoff dynamical model (PDM), describes a mechanism that determines the payoff as a causal map of the population state. A bounded-rationality protocol, inspired primarily on evolutionary biology principles, governs how each agent revises its strategy repeatedly based on complete or partial knowledge of the population state and payoff. The population is protocol-homogeneous but is otherwise strategy-heterogeneous considering that the agents are allowed to select distinct strategies concurrently. A stochastic mechanism determines the instants when agents revise their strategies, but we consider that the population is large enough that, with high probability, the population state can be approximated with arbitrary accuracy uniformly over any finite horizon by a so-called (deterministic) mean population state. We propose an approach that takes advantage of passivity principles to obtain sufficient conditions determining, for a given protocol and PDM, when the mean population state is guaranteed to converge to a meaningful set of equilibria, which could be either an appropriately defined extension of Nash’s for the PDM or a perturbed version of it. By generalizing and unifying previous work, our framework also provides a foundation for future work. …

GP-DRF google
Deep Gaussian processes (DGP) have appealing Bayesian properties, can handle variable-sized data, and learn deep features. Their limitation is that they do not scale well with the size of the data. Existing approaches address this using a deep random feature (DRF) expansion model, which makes inference tractable by approximating DGPs. However, DRF is not suitable for variable-sized input data such as trees, graphs, and sequences. We introduce the GP-DRF, a novel Bayesian model with an input layer of GPs, followed by DRF layers. The key advantage is that the combination of GP and DRF leads to a tractable model that can both handle a variable-sized input as well as learn deep long-range dependency structures of the data. We provide a novel efficient method to simultaneously infer the posterior of GP’s latent vectors and infer the posterior of DRF’s internal weights and random frequencies. Our experiments show that GP-DRF outperforms the standard GP model and DRF model across many datasets. Furthermore, they demonstrate that GP-DRF enables improved uncertainty quantification compared to GP and DRF alone, with respect to a Bhattacharyya distance assessment. Source code is available at https://…/GP_DRF.

Continue Reading…


Read More

Magister Dixit

“Statistics is one of the most important skills required by a data scientist.” G S Praneeth Reddy ( 12.11.2018 )

Continue Reading…


Read More

Distilled News

Learning in Graphs with Python (Part 3)

Concepts, applications, and examples with Python. Graphs are becoming central to machine learning these days, whether you’d like to understand the structure of a social network by predicting potential connections, detecting fraud, understand customer’s behavior of a car rental service or making real-time recommendations for example.

Technical Deep Dive: Random Forests

Random Forests are one of the most popular machine learning models used by data scientists today. How they are actually implemented and the variety of use cases they can be applied to are often overlooked. While this article will focus on the inner workings of Random Forests, we’ll start off by exploring the main problems this model solves.

Early stopping in polynomial regression

Using a deep learning technique to fight overfitting for a simple linear regression model. I was testing an example from scikit-learn site, that demonstrates the problems of underfitting and overfitting and how we can use linear regression with polynomial features to approximate nonlinear functions, according to the article. Below is a modified version of this code.

5 Weird Ways to Use Data Science

So, I decided to collect 5 cases when data science is used for nothing but fun. Let’s get started, shall we?
1. Game Of Throne Deaths in Season 8, Data Science’s Angle
2. Predicting the outcome of sports game: Syracuse over Michigan State
3. Taylor Swift detector developed with Swift
4. Game of Wines – the ML and data science based detector of wine quality
5. Who’s killing the Academy Awards Game? – predicted by data science

Enabling developers and organizations to use differential privacy

Whether you’re a city planner, a small business owner, or a software developer, gaining useful insights from data can help make services work better and answer important questions. But, without strong privacy protections, you risk losing the trust of your citizens, customers, and users. Differentially-private data analysis is a principled approach that enables organizations to learn from the majority of their data while simultaneously ensuring that those results do not allow any individual’s data to be distinguished or re-identified. This type of analysis can be implemented in a wide variety of ways and for many different purposes. For example, if you are a health researcher, you may want to compare the average amount of time patients remain admitted across various hospitals in order to determine if there are differences in care. Differential privacy is a high-assurance, analytic means of ensuring that use cases like this are addressed in a privacy-preserving manner. Today, we’re rolling out the open-source version of the differential privacy library that helps power some of Google’s core products. To make the library easy for developers to use, we’re focusing on features that can be particularly difficult to execute from scratch, like automatically calculating bounds on user contributions. It is now freely available to any organization or developer that wants to use it.

Enhancing Static Plots with Animations

This post aims to introduce you to animating ggplot2 visualisations in r using the gganimate package by Thomas Lin Pedersen. The post will visualise the theoretical winnings I would’ve had, had I followed the simple model to predict (or tip as it’s known in Australia) winners in the AFL that I explained in this post. The data used in the analysis was collected from the AFL Tables website as part of a larger series I wrote about on AFL crowds.

Model Evaluation in the Land of Deep Learning

Applications for machine learning and deep learning have become increasingly accessible. For example, Keras provides APIs with TensorFlow backend that enable users to build neural networks without being fluent with TensorFlow. Despite the ease of building and testing models, deep learning has suffered from a lack of interpretability; deep learning models are considered black boxes to many users. In a talk at ODSC West in 2018, Pramit Choudhary explained the importance of model evaluation and interpretability in deep learning and some cutting edge techniques for addressing it.

Comparison of Lightweight Document Classification Models

Document Classification: The task of assigning labels to large bodies of text. In this case the task is to classify news articles into different labels, such as sport or politics. The data set used wasn’t ideally suited for deep learning, having only low thousands of examples, but this is far from an unrealistic case outside larger firms.

Building a Natural Language Processing Pipeline

Copenhagen is the capital and most populous city of Denmark and capital sits on the coastal islands of Zealand and Amager. It’s linked to Malmo in southern Sweden by the Oresund Bridge. Indre By, the city’s historic centre, contains Frederiksstaden, an 18th-century rococo district, home to the royal family’s Amalienborg Palace. Nearby is Christiansborg Palace and the Renaissance-era Rosenborg Castle, surrounded by gardens and home to the crown jewels.

Heart disease Classification with Apache beam and Tensorflow Transform

Machine learning models include the step of preprocessing or feature engineering before the data is actually trainable. Feature Engineering includes normalizing and scaling data, encoding categorical values as numerical values, forming vocabularies, and binning of continuous numerical values. Distributed frameworks like Google Cloud Dataflow or Apache Spark are often well known for applying large scale data preprocessing. To remove the inconsistency between training and serving ML models from different environments Google has come up with tf.Transform, a library for TensorFlow that ensures consistency of the feature engineering steps during model training and serving.

Philosophy and the practice of Bayesian statistics

A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.


DeepPrivacy is a fully automatic anonymization technique for images. This repository contains the source code for the paper ‘DeepPrivacy: A Generative Adversarial Network for Face Anonymization’, published at ISVC 2019.

Continue Reading…


Read More

RSwitch 1.5.0 Release Now Also Corrals RStudio Server Connections

[This article was first published on R –, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

RSwitch is a macOS menubar application that works on macOS 10.14+ and provides handy shortcuts for developing with R on macOS. Version 1.5.0 brings a reorganized menu system and the ability to manage and make connections to RStudio Server instances. Here’s a quick peek at the new setup:

All books, links, and other reference resources are under a single submenu system:

If there’s a resource you’d like added, follow the links on the main RSwitch site to file PRs where you’re most comfortable.

You can also setup automatic checks and notifications for when new RStudio Dailies are available (you can still always check manually and this check feature is off by default):

But, the biggest new feature is the ability to manage and launch RStudio Server connections right from RSwitch:

Click to view slideshow.

These RStudio Server browser connections are kept separate from your internet browsing and are one menu selection away. RSwitch also remembers the size and position of your RStudio Server session windows, so everything should be where you want/need/expect. This is somewhat of an experimental feature so definitely file issues if you run into any problems or would like things to work differently.


Kick the tyres, file issues or requests and, if so inclined, let me know how you’re liking RSwitch!

To leave a comment for the author, please follow the link and comment on their blog: R – offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

I think that science is mostly “Brezhnevs.” It’s rare to see a “Gorbachev” who will abandon a paradigm just because it doesn’t do the job. Also, moving beyond naive falsificationism

Sandro Ambuehl writes:

I’ve been following your blog and the discussion of replications and replicability across different fields daily, for years. I’m an experimental economist. The following question arose from a discussion I recently had with Anna Dreber, George Loewenstein, and others.

You’ve previously written about the importance of sound theories (and the dangers of anything-goes theories), and I was wondering whether there’s any formal treatment of that, or any empirical evidence on whether empirical investigations based on precise theories that simultaneously test multiple predictions are more likely to replicate than those without theoretical underpinnings, or those that test only isolated predictions.

Specifically: Many of the proposed solutions to the replicability issue (such as preregistration) seem to implicitly assume one-dimensional hypotheses such as “Does X increase Y?” In experimental economics, by contrast, we often test theories. The value of a theory is precisely that it makes multiple predictions. (In economics, theories that explain just one single phenomenon, or make one single prediction are generally viewed as useless and are highly discouraged.) Theories typically also specify how its various predictions relate to each other, often even regarding magnitudes. They are formulated as mathematical models, and their predictions are correspondingly precise. Let’s call a within-subjects experiment that tests a set of predictions of a theory a “multi-dimensional experiment”.

My conjecture is that all the statistical skulduggery that leads to non-replicable results is much harder to do in a theory-based, multi-dimensional experiment. If so, multi-dimensional experiment should lead to better replicability even absent safeguards such as preregistration.

The intuition is the following. Suppose an unscrupulous researcher attempts to “prove” a single prediction that X increases Y. He can do that by selectively excluding subjects with low X and high Y (or high X and low Y) from the sample. Compare that to a researcher who attempts to “prove”, in a within-subject experiment, that X increases Y and A increases B. The latter researcher must exclude many more subjects until his “preferred” sample includes only subjects that conform to the joint hypothesis. The exclusions become harder to justify, and more subjects must be run.

A similar intuition applies to the case of an unscrupulous researcher who tries to “prove” a hypothesis by messing with the measurements of variables (e.g. by using log(X) instead of X). Here, an example is a theory that predicts that X increases both Y and Z. Suppose the researcher finds a Null if he regresses X on Y, but finds a positive correlation between f(X) on Y for some selected transformation f. If the researcher only “tested” the relation between X and Y (a one-dimensional experiment), the researcher could now declare “success”. In a multi-dimensional experiment, however, the researcher will have to dig for an f that doesn’t only generate a positive correlation between f(X) and Y, but also between f(X) and Z, which is harder. A similar point applies if the researcher measures X in different ways (e.g. through a variety of related survey questions) and attempts to select the measurement that best helps “prove” the hypothesis. (Moreover, such a theory would typically also specify something like “If X increases Y by magnitude alpha, then it should increase Z by magnitude beta”. The relation between Y and Z would then present an additional prediction to be tested, yet again increasing the difficulty of “proving” the result through nefarious manipulations.)

So if there is any formal treatment relating to the above intuitions, or any empirical evidence on what kind of research tends to be more or less likely to replicate (depending on factors other than preregistration), I would much appreciate if you could point me to it.

My reply:

I have two answers for you.

First, some colleagues and I recently published a preregistered replication of one of our own studies; see here. This might be interesting to you because our original study did not test a single thing, so our evaluation was necessarily holistic. In our case, the study was descriptive, not theoretically-motivated, so it’s not quite what you’re talking about—but it’s like your study in that the outcomes of interest were complex and multidimensional.

This was one of the problems I’ve had with recent mass replication studies, that they treat a scientific paper as if it has a single conclusion, even though real papers—theoretically-based or not—typically have many conclusions.

My second response is that I fear you are being too optimistic. Yes, when a theory makes multiple predictions, it may be difficulty to select data to make all the predictions work out. But on the other hand you have many degrees of freedom with which to declare success.

This has been one of my problems with a lot of social science research. Just about any pattern in data can be given a theoretical explanation, and just about any pattern in data can be said to be the result of a theoretical prediction. Remember that claim that women were three times more likely to wear red or pink clothing during a certain time of the month? The authors of that study did a replication which failed–but they declared it a success after adding an interaction with outdoor air temperature. Or there was this political science study where the data went in the opposite direction of the preregistration but were retroactively declared to be consistent with the theory. It’s my impression that a lot of economics is like this too: If it goes the wrong way, the result can be explained. That’s fine—it’s one reason why economics is often a useful framework for modeling the world—but I think the idea that statistical studies and p-values and replication are some sort of testing ground for models, the idea that economists are a group of hard-headed Popperians, regularly subjecting their theories to the hard test of reality—I’m skeptical of that take. I think it’s much more that individual economists, and schools of economists, are devoted to their theories and only rarely abandon them on their own. That is, I have a much more Kuhnian take on the whole process. Or, to put it another way, I try to be Popperian in my own research, I think that’s the ideal, but I think the Kuhnian model better describes the general process of science. Or, to put it another way, I think that science is mostly “Brezhnevs.” It’s rare to see a “Gorbachev” who will abandon a paradigm just because it doesn’t do the job.

Ambuehl responded:

Anna did have a similar reaction to you—and I think that reaction depends much on what passes as a “theory”. For instance, you won’t find anything in a social psychology textbook that an economic theorist would call a “theory”. You’re certainly right about the issues pertaining to hand-wavy ex-post explanations as with the clothes and ovulation study, or “anything-goes theories” such as the Himicanes that might well have turned out the other way.

By contrast, the theories I had in mind when asking the question are mathematically formulated theories that precisely specify their domain of applicability. An example of the kind of theory I have in mind would be Expected Utility theory, tested in countless papers, e.g. here). Another example of such a theory is the Shannon model of choice under limited attention (tested, e.g., here). These theories are in an entirely different ballpark than vague ideas like, e.g., self-perception theory or social comparison theory that are so loosely specified that one cannot even begin to test them unless one is willing to make assumptions on each of the countless researcher degrees of freedom they leave open.

In fact, economic theorists tend to regard the following characteristics virtues, or even necessities, of any model: precision (can be tested without requiring additional assumptions), parsimony (and hence, makes it hard to explain “uncomfortable” results by interactions etc.), generality (in the sense that they make multiple predictions, across several domains). And they very much frown upon ex post theorizing, ad-hoc assumptions, and imprecision. For theories that satisfy these properties, it would seem much harder to fudge empirical research in a way that doesn’t replicate, wouldn’t it? (Whether the community will accept the results or not seems orthogonal to the question of replicability, no?)

Finally, to the extent that theories in the form of precise, mathematical models are often based on wide bodies of empirical research (economic theorists often try to capture “stylized facts”), wouldn’t one also expect higher rates of replicability because such theories essentially correspond to well-informed priors?

So my overall point is, doesn’t (good) theory have a potentially important role to play regarding replicability? (Many current suggestions for solving the replication crisis, in particular formulaic ones such as pre-registration, or p<0.005, don't seem to recognize those potential benefits of sound theory.)

I replied:

Well, sure, but expected utility theory is flat-out false. Much has been written on the way that utilities only exist after the choices are given. This can even be seen in simple classroom demonstrations, as in section 5 of this paper from 1998. No statistics are needed at all to demonstrate the problems with that theory!

Amdahl responded with some examples of more sophisticated, but still testable, theories such as reference-dependent preferences, various theories of decision making under ambiguity, and perception-based theories, and I responded with my view that all these theories are either vague enough to be adaptable to any data or precise enough to be evidently false with no data collection needed. This was what Lakatos noted: any theory is either so brittle that it can be destroyed by collecting enough data, or flexible enough to fit anything. This does not mean we can’t do science, it just means we have to move beyond naive falsificationism.

P.S. Tomorrow’s post: “Boston Globe Columnist Suspended During Investigation Of Marathon Bombing Stories That Don’t Add Up.”

Continue Reading…


Read More

What’s going on on PyPI

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

Data Science Tools. pychamp is a data science tool intended to ease data science practices.
• configparser : It can be used for parsing configuration file format `json` and `ini`.
• connection : It can be used for creating `connection URL`, `connection engine` and for `executing` any type of sql `queries`.
• features_selection : It can be used for selecting features using `Backward Elimination`, `VIF` and `Features Importance`.
• sampling : It can be used for different types of sampling operations such as `SMOTE`, `SMOTENC` and `ADASYN` for both categorical and numerical features.
• stats : It can be used for `Confidence and Prediction Interval`, `IQR outlier removal` and `Summary Statistics`.
• viz : It can be used for visualization.
• net : It can be used for sending mail currently.
• model : Different types of regression, classification and clustering models can be used.
• eda : It handles data types and missing values.

Python Package for exploratory data analysis in Data Science

PYthon Simple Neural Network – PYSNN is python3 lib for machine learning

The Python document processor

Toolkit for recommender systems. Toolkit for building recommender systems
• Provide CLI interface for running recommendation algorithms
• Contains abstractions you can leverage to build custom recommenders

TopicNet is a module for topic modelling using ARTM algorithm

ATOM is an AutoML package. ATOM is a python package for exploration of ML problems. With just a few lines of code, you can compare the performance of multiple machine learning models on a given dataset, providing a quick insight on which algorithms performs best for the task at hand. Furthermore, ATOM contains a variety of plotting functions to help you analyze the models’ performances.

CAnonical Time-series Features, see description and license on GitHub.

Python implementation of causal trees with validation

Framework helping testing Google Cloud Dataflows. This framework aims to help test Google Cloud Platform dataflows in an end-to-end way.

A Python Toolbox for Algorithmic Fairness, Accountability and Transparency

OBA Sparql Manager

Modern Scientific Document Processing Framework. SciWING is a modern framework from WING-NUS to facilitate Scientific Document Processing. It is built on PyTorch and believes in modularity from ground up and easy to use interface. SciWING includes many pre-trained models for fundamental tasks in Scientific Document Processing for practitioners.

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm in Python

Gaussian & Binomial Distributions

Continue Reading…


Read More

What’s going on on PyPI

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

Data Science Tools. pychamp is a data science tool intended to ease data science practices.
• configparser : It can be used for parsing configuration file format `json` and `ini`.
• connection : It can be used for creating `connection URL`, `connection engine` and for `executing` any type of sql `queries`.
• features_selection : It can be used for selecting features using `Backward Elimination`, `VIF` and `Features Importance`.
• sampling : It can be used for different types of sampling operations such as `SMOTE`, `SMOTENC` and `ADASYN` for both categorical and numerical features.
• stats : It can be used for `Confidence and Prediction Interval`, `IQR outlier removal` and `Summary Statistics`.
• viz : It can be used for visualization.
• net : It can be used for sending mail currently.
• model : Different types of regression, classification and clustering models can be used.
• eda : It handles data types and missing values.

Python Package for exploratory data analysis in Data Science

PYthon Simple Neural Network – PYSNN is python3 lib for machine learning

The Python document processor

Toolkit for recommender systems. Toolkit for building recommender systems
• Provide CLI interface for running recommendation algorithms
• Contains abstractions you can leverage to build custom recommenders

TopicNet is a module for topic modelling using ARTM algorithm

ATOM is an AutoML package. ATOM is a python package for exploration of ML problems. With just a few lines of code, you can compare the performance of multiple machine learning models on a given dataset, providing a quick insight on which algorithms performs best for the task at hand. Furthermore, ATOM contains a variety of plotting functions to help you analyze the models’ performances.

CAnonical Time-series Features, see description and license on GitHub.

Python implementation of causal trees with validation

Framework helping testing Google Cloud Dataflows. This framework aims to help test Google Cloud Platform dataflows in an end-to-end way.

A Python Toolbox for Algorithmic Fairness, Accountability and Transparency

OBA Sparql Manager

Modern Scientific Document Processing Framework. SciWING is a modern framework from WING-NUS to facilitate Scientific Document Processing. It is built on PyTorch and believes in modularity from ground up and easy to use interface. SciWING includes many pre-trained models for fundamental tasks in Scientific Document Processing for practitioners.

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm in Python

Gaussian & Binomial Distributions

Continue Reading…


Read More

If you did not already know

Apache Avro google
Apache Avro is a data serialization system. …

Feudal Multi-agent Hierarchies (FMH) google
We investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from human societies, in which successful coordination of many individuals is often facilitated by hierarchical organisation, we introduce Feudal Multi-agent Hierarchies (FMH). In this framework, a ‘manager’ agent, which is tasked with maximising the environmentally-determined reward function, learns to communicate subgoals to multiple, simultaneously-operating, ‘worker’ agents. Workers, which are rewarded for achieving managerial subgoals, take concurrent actions in the world. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We find that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use a shared reward function. …

Knowledge-Guided Generative Adversarial Network (KG-GAN) google
Generative adversarial networks (GANs) learn to mimic training data that represents the underlying true data distribution. However, GANs suffer when the training data lacks quantity or diversity and therefore cannot represent the underlying distribution well. To improve the performance of GANs trained on under-represented training data distributions, this paper proposes KG-GAN (Knowledge-Guided Generative Adversarial Network) to fuse domain knowledge with the GAN framework. KG-GAN trains two generators; one learns from data while the other learns from knowledge. To achieve KG-GAN, domain knowledge is formulated as a constraint function to guide the learning of the second generator. We validate our framework on two tasks: fine-grained image generation and hair recoloring. Experimental results demonstrate the effectiveness of KG-GAN. …

Relay google
Frameworks for writing, compiling, and optimizing deep learning (DL) models have recently enabled progress in areas like computer vision and natural language processing. Extending these frameworks to accommodate the rapidly diversifying landscape of DL models and hardware platforms presents challenging tradeoffs between expressiveness, composability, and portability. We present Relay, a new intermediate representation (IR) and compiler framework for DL models. The functional, statically-typed Relay IR unifies and generalizes existing DL IRs and can express state-of-the-art models. Relay’s expressive IR required careful design of the type system, automatic differentiation, and optimizations. Relay’s extensible compiler can eliminate abstraction overhead and target new hardware platforms. The design insights from Relay can be applied to existing frameworks to develop IRs that support extension without compromising on expressivity, composibility, and portability. Our evaluation demonstrates that the Relay prototype can already provide competitive performance for a broad class of models running on CPUs, GPUs, and FPGAs. …

Continue Reading…


Read More

September 13, 2019

ttdo 0.0.3: New package

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A new package of mine arrived on CRAN yesterday, having been uploaded a few days prior on the weekend. It extends the most excellent (and very minimal / zero depends) unit testing package tinytest by Mark van der Loo with the very clever and well-done diffobj package by Brodie Gaslam. Mark also tweeted about it.

ttdo screenshot

The package was written to address a fairly specific need. In teaching STAT 430 at Illinois, I am relying on the powerful PrairieLearn system (developed there) to provides tests, quizzes or homework. Alton and I have put together an autograder for R (which is work in progress, more on that maybe another day), and that uses this package to provides colorized differences between supplied and expected answers in case of an incorrect answer.

Now, the aspect of providing colorized diffs when tests do not evalute to TRUE is both simple and general enough. As our approach works rather well, I decided to offer the package on CRAN as well. The small screenshot gives a simple idea, the contains a larger screenshoot.

The initial NEWS entries follow below.

Changes in ttdo version 0.0.3 (2019-09-08)

  • Added a simple demo to support initial CRAN upload.

Changes in ttdo version 0.0.2 (2019-08-31)

  • Updated defaults for format and mode to use the same options used by diffobj along with fallbacks.

Changes in ttdo version 0.0.1 (2019-08-26)

  • Initial version, with thanks to both Mark and Brodie.

Please use the GitHub repo and its issues for any questions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box . offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Distilled News

Document Embedding Techniques

Word embeddings – the mapping of words into numerical vector spaces – has proved to be an incredibly important method for natural language processing (NLP) tasks in recent years, enabling various machine learning models that rely on vector representation as input to enjoy richer representations of text input. These representation preserve more semantic and syntactic information on words, leading to improved performance in almost every imaginable NLP task.

BERT is changing the NLP landscape

BERT is changing the NLP landscape and making chatbots much smarter by enabling computers to better understand speech and respond intelligently in real-time. What Makes BERT so Amazing?
• BERT is a contextual model.
• BERT enables transfer learning.
• BERT can be fine-tuned cheaply and quickly.

Introduction to Neural Networks and Their Key Elements (Part-B) – Hyper-Parameters

In the previous story (part A) we discussed the structure and three main building blocks of a Neural Network. This story will take you through the elements which really make a useful force and separate them from rest of the Machine Learning Algorithms. Previously we discussed about Units/Neurons, Weights/Parameters & Biases today we will discuss – Hyper-Parameters

Tutorial on Variational Graph Auto-Encoders

Graphs are applicable to many real-world datasets such as social networks, citation networks, chemical graphs, etc. The growing interest in graph-structured data increases the number of researches in graph neural networks. Variational autoencoders (VAEs) embodied the success of variational Bayesian methods in deep learning and have inspired a wide range of ongoing researches. Variational graph autoencoder (VGAE) applies the idea of VAE on graph-structured data, which significantly improves predictive performance on a number of citation network datasets such as Cora and Citesser. I searched on the internet and have yet to see a detailed tutorial on VGAE. In this article, I will briefly talk about traditional autoencoders and variational autoencoders. Furthermore, I will discuss the idea of applying VAE to graph-structured data (VGAE).

Automate Hyperparameter Tuning for your models

When we create our machine learning models, a common task that falls on us is how to tune them. People end up taking different manual approaches. Some of them work, and some don’t, and a lot of time is spent in anticipation and running the code again and again. So that brings us to the quintessential question: Can we automate this process?

Is the pain worth it?: Can Rcpp speed up Passing Bablok Regression?

R dogma is that for loops are bad because they are slow but this is not the case in C++. I had never programmed a line of C++ as of last week but my beloved firstborn started university last week and is enrolled in a C++ intro course, so I thought I would try to learn some and see if it would speed up Passing Bablok regression.

What it is really like to develop a model for a real-world business case. Have you ever taken part in a Kaggle competition? If you are studying, or have studied machine learning it is fairly likely that at some point you will have entered one. It is definitely a great way to put your model building skills into practice and I spent quite a bit of time on Kaggle when I was studying.

A Breakthrough for A.I. Technology: Passing an 8th-Grade Science Test

On Wednesday, the Allen Institute for Artificial Intelligence, a prominent lab in Seattle, unveiled a new system that passed the test with room to spare. It correctly answered more than 90 percent of the questions on an eighth-grade science test and more than 80 percent on a 12th-grade exam.

The Anthropologist of Artificial Intelligence

How do new scientific disciplines get started? For Iyad Rahwan, a computational social scientist with self-described ‘maverick’ tendencies, it happened on a sunny afternoon in Cambridge, Massachusetts, in October 2017. Rahwan and Manuel Cebrian, a colleague from the MIT Media Lab, were sitting in Harvard Yard discussing how to best describe their preferred brand of multidisciplinary research. The rapid rise of artificial intelligence technology had generated new questions about the relationship between people and machines, which they had set out to explore. Rahwan, for example, had been exploring the question of ethical behavior for a self-driving car – should it swerve to avoid an oncoming SUV, even if it means hitting a cyclist? – in his Moral Machine experiment.

Getting Started With Text Preprocessing for Machine Learning & NLP

Based on some recent conversations, I realized that text preprocessing is a severely overlooked topic. A few people I spoke to mentioned inconsistent results from their NLP applications only to realize that they were not preprocessing their text or were using the wrong kind of text preprocessing for their project. With that in mind, I thought of shedding some light around what text preprocessing really is, the different techniques of text preprocessing and a way to estimate how much preprocessing you may need. For those interested, I’ve also made some text preprocessing code snippets in python for you to try. Now, let’s get started!

Introducing Neural Structured Learning in TensorFlow

We are excited to introduce Neural Structured Learning in TensorFlow, an easy-to-use framework that both novice and advanced developers can use for training neural networks with structured signals. Neural Structured Learning (NSL) can be applied to construct accurate and robust models for vision, language understanding, and prediction in general.

2018 in Review: 10 AI Failures

• Chinese billionaire’s face identified as jaywalker
• Uber self-driving car kills a pedestrian
• IBM Watson comes up short in healthcare
• Amazon AI recruiting tool is gender biased
• DeepFakes reveals AI’s unseemly side
• Google Photo confuses skier and mountain
• LG robot Cloi gets stagefright at its unveiling
• Boston Dynamics robot blooper
• AI World Cup 2018 predictions almost all wrong
• Startup claims to predict IQ from faces

Uber has troves of data on how people navigate cities. Urban planners have begged, pleaded, and gone to court for access. Will they ever get it?

As the deputy director for technology, data, and analysis at the San Francisco County Transportation Authority, Castiglione spends his days manipulating models of the Bay Area and its 7 million residents. From wide-sweeping ridership and traffic data to deep dives into personal travel choices via surveys, his models are able to estimate the number of people who will disembark at a specific train platform at a certain time of day and predict how that might change if a new housing development is built nearby, or if train-frequency is increased. The models are exceedingly complex, because people are so complex. ‘Think about the travel choices you’ve made in the last week, or the last year,’ Castiglione says. ‘How do you time your trips? What tradeoffs do you make? What modes of transportation do you use? How do those choices change from day to day?’ He has the deep voice of an NPR host and the demeanor of a patient professor. ‘The models are complex but highly rational,’ he says.

Visualizing SVM with Python

In my previous article, I introduced the idea behind the classification algorithm Support Vector Machine. Here, I’m going to show you a practical application in Python of what I’ve been explaining, and I will do so by using the well-known Iris dataset. Following the same structure of that article, I will first deal on linearly separable data, then I will move towards no-linearly separable data, so that you can appreciate the power of SVM which lie in the so-called Kernel Trick.

How to generate neural network confidence intervals with Keras

Whether we’re predicting water levels, queue lengths or bike rentals, at HAL24K we do a lot of regression, with everything from random forests to recurrent neural networks. And as good as our models are, we know they can never be perfect. Therefore, whenever we provide our customers with predictions, we also like to include a set of confidence intervals: what range around the prediction will the actual value fall within, with (e.g.) 80% confidence?

Continue Reading…


Read More

Tell Me a Story: How to Generate Textual Explanations for Predictive Models

[This article was first published on English –, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

TL;DR: If you are going to explain predictions for a black box model you should combine statistical charts with natural language descriptions. This combination is more powerful than SHAP/LIME/PDP/Break Down charts alone. During this summer Adam Izdebski implemented this feature for explanations generated in R with DALEX library. How he did it? Find out here:

Long version:
Amazing things were created during summer internships at MI2DataLab this year. One of them is the generator of natural language descriptions for DALEX explainers developed by Adam Izdebski.

Is text better than charts for explanations?
Packages from DrWhy.AI toolbox generate lots of graphical explanations for predictive models. Available statistical charts allow to better understand how a model is working in general (global perspective) or for a specific prediction (local perspective).
Yet for domain experts without training in mathematics or computer science, graphical explanations may be insufficient. Charts are great for exploration and discovery, but for explanations they introduce some ambiguity. Have I read everything? Maybe I missed something?
To address this problem we introduced the describe() function, which
automatically generates textual explanations for predictive models. Right now these natural language descriptions are implemented in R packages by ingredients and iBreakDown.

Insufficient interpretability
Domain experts without formal training in mathematics or computer science can often find statistical explanations as hard to interpret. There are various reasons for this. First of all, explanations are often displayed as complex plots without instructions. Often there is no clear narration or interpretation visible. Plots are using different scales mixing probabilities with relative changes in model’s prediction. The order of variables may be also misleading. See for example a Break-Down plot.

The figure displays the prediction generated with Random Forest that a selected passenger survived the Titanic sinking. The model’s average response on titanic data set (intercept) is equal to 0.324. The model predicts that the selected passenger survived with probability 0.639. It also displays all the variables that have contributed to that prediction. Once the plot is described it is easy to interpret as it posses a very clear graphical layout.
However, interpreting it for the first time may be tricky.

Properties of a good description
Effective communication and argumentation is a difficult craft. For that reason, we refer to winning debates strategies as for guiding in generating persuasive textual explanations. First of all, any description should be
intelligible and persuasive. We achieve this by using:

  • Fixed structure: Effective communication requires a rigid structure. Thus we generate descriptions from a fixed template, that always includes a proper introduction, argumentation and conclusion part. This makes the description more predictable, hence intelligible.
  • Situation recognition: In order to make a description more trustworthy, we begin generating the text by identifying one of the scenarios, that we are dealing with. Currently, the following scenarios are available:
    • The model prediction is significantly higher than the average model prediction. In this case, the description should convince the reader why the prediction is higher than the average.
    • The model prediction is significantly lower than the average model prediction. In this case, the description should convince the reader why the prediction is lower than the average.
    • The model prediction is close to the average. In this case the description should convince the reader that either: variables are contradicting each other or variables are insignificant.

Identifying what should be justified, is a crucial step for generating persuasive descriptions.

Description’s template for persuasive argumentation
As noted before, to achieve clarity we generate descriptions with three separate components: an introduction, an argumentation part, and a summary.

An introduction should provide a claim. It is a basic point that an arguer wishes to make. In our case, it is the model’s prediction. Displaying the additional information about the predictions’ distribution helps to place it in a context — is it low, high or close to the average.

An argumentation part should provide evidence and reason, which connects the evidence to the claim. In normal settings this will work like that: This particular passenger survived the catastrophe (claim) because it was a child (evidence no. 1) and children were evacuated from the ship in the first order as in the phrase women and children first. (reason no. 1) What is more, the children were traveling in the 1-st class (evidence no. 2) and first-class passengers had the best cabins, which were close to the rescue boats. (reason no. 2).

The tricky part is that we are not able to make up a reason automatically, as it is a matter of context and interpretation. However what we can do is highlight the main evidence, that made the model produce the claim. If a model is making its’ predictions for the right reason, evidences should make much sense and it should be easy for the reader to make a story and connect the evidence to the claim. If the model is displaying evidence, that makes not much sense, it also should be a clear signal, that the model may not be trustworthy.

A summary is just the rest of the justification. It states that other pieces of evidence are with less importance, thus they may be omitted. A good rule of thumb is displaying three most important evidence, not to make the picture too complex. We can refer to the above scheme as to creating relational arguments as in winning debates guides.

The logic described above is implemented in ingredients and iBreakDown packages.

For generating a description we should pass the explanation generated by ceteris_paribus() or break_down() or shap() to the describe() function.

# Random Forest predicts, that the prediction for the selected instance is 0.639 which is higher than the average.
# The most important variables that increase the prediction are gender, fare.
# The most important variable that decrease the prediction is class.
# Other variables are with less importance. The contribution of all other variables is -0.063 .

There are various parameters that control the display of the description making it more flexible, thus suited for more applications. They include:

  • generating a short version of descriptions,
  • displaying predictions’ distribution details,
  • generating more detailed argumentation.

While explanations generated by iBreakDown are feature attribution explanations that aim at providing interpretable reasons for the model’s prediction, explanations generated by ingredients are rather speculative. In fine, they explain how the model’s prediction would change if we perturb the instance being explained. For example, ceteris_paribus() explanation explores how would the prediction change if we change the values of a single feature while keeping the other features unchanged.

describe(ceteris_paribus_explanation, variables = “age”)
# For the selected instance Random Forest predicts that , prediction is equal to 0.699.
# The highest prediction occurs for (age = 16), while the lowest for (age = 74).
# Breakpoint is identified at (age = 40).
# Average model responses are *lower* for variable values *higher* than breakpoint.

Applications and future work

Generating natural language explanations is a sensitive task, as the interpretability always depends on the end user’s cognition. For this reason, experiments should be designed to assess the usefulness of the descriptions being generated. Furthermore, more vocabulary flexibility could be added, to make the descriptions more human alike. Lastly, descriptions could be integrated with a chatbot that would explain predictions interactively, using the framework described here. Also, better discretization techniques can be used for generating better continuous ceteris paribus and aggregated profiles textual explanations.

To leave a comment for the author, please follow the link and comment on their blog: English – offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Magister Dixit

“Disributed computation generally is hard, because it adds an additional layer of complexity and communication overhead. The ideal case is scaling linearly with the number of nodes; that’s rarely the case. Emerging evidence shows that very often, one big machine, or even a laptop, outperforms a cluster.” Zygmunt Z. ( 2015-04-27 )

Continue Reading…


Read More

NIMBLE short course at Bayes Comp 2020 conference

[This article was first published on R – NIMBLE, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We’ll be giving a short course on NIMBLE on January 7, 2020 at the Bayes Comp 2020 conference being held January 7-10 in Gainesville, Florida, USA.

Bayes Comp is a popular biennial ISBA-sponsored conference focused on computational methods/algorithms/technologies for Bayesian inference.

The short course focuses on programming algorithms in NIMBLE and is titled:

“Developing, modifying, and sharing Bayesian algorithms (MCMC samplers, SMC, and more) using the NIMBLE platform in R.”

More details on the conference and the short course are available at the conference website.

To leave a comment for the author, please follow the link and comment on their blog: R – NIMBLE. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Many Heads Are Better Than One: The Case For Ensemble Learning

While ensembling techniques are notoriously hard to set up, operate, and explain, with the latest modeling, explainability and monitoring tools, they can produce more accurate and stable predictions. And better predictions can be better for business.

Continue Reading…


Read More

AutoML: BigML Automated Machine Learning

Last year, BigML launched the OptiML resource for Automatic Model Optimization. Without a doubt, it has marked a milestone in our Machine Learning platform. Since then, many users have included OptiML in their Machine Learning toolboxes. However, some users are asking us to go further than model selection, so today we’re presenting BigML’s AutoML, an […]

Continue Reading…


Read More

Initializing an empty list

[This article was first published on woodpeckR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


How do I initialize an empty list for use in a for-loop or function?


Sometimes I’m writing a for-loop (I know, I know, don’t use for-loops, but sometimes it’s just easier. I’m a little less good at apply functions than I’d like to be) and I know I’ll need to store the output in a list. Once in a while, the new list will be the same size as an existing one, but more often, I just need to start from scratch, knowing only the number of elements I want to include.

This isn’t a totally alien thing to need to do––it’s pretty familiar if you’re used to initializing empty vectors before for-loops. There’s a whole other debate to be had about whether or not it’s acceptable to start with a truly empty vector and append to it on every iteration of the loop or whether you should always know the length beforehand, but I’ll just focus on the latter case for now.

Anyway, initializing a vector of a given length is easy enough; I usually do it like this:

> desired_length <- 10 # or whatever length you want
> empty_vec <- rep(NA, desired_length)

I couldn’t immediately figure out how to replicate this for a list, though. The solution turns out to be relatively simple, but it’s just different enough that I can never seem to remember the syntax. This post is more for my records than anything, then.


Initializing an empty list turns out to have an added benefit over my rep(NA) method for vectors; namely, the list ends up actually empty, not filled with NA’s. Confusingly, the function to use is “vector,” not “list.”

> desired_length <- 10 # or whatever length you want
> empty_list <- vector(mode = "list", length = desired_length)

> str(empty_list)
List of 10
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL


Voilà, an empty list. No restrictions on the data type or structure of the individual list elements. Specify the length easily. Useful for loops, primarily, but may have other applications I haven’t come across yet.

To leave a comment for the author, please follow the link and comment on their blog: woodpeckR. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Version Control for Data Science: Tracking Machine Learning Models and Datasets

I am a Git god, why do I need another version control system for Machine Learning Projects?

Continue Reading…


Read More

Deterministic thinking (“dichotomania”): a problem in how we think, not just in how we act

This has come up before:

Basketball Stats: Don’t model the probability of win, model the expected score differential.

Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome, model the underlying continuous variable

Thinking like a statistician (continuously) rather than like a civilian (discretely)

Message to Booleans: It’s an additive world, we just live in it

And it came up again recently.

Epidemiologist Sander Greenland has written about “dichotomania: the compulsion to replace quantities with dichotomies (‘black-and-white thinking’), even when such dichotomization is unnecessary and misleading for inference.”

I’d avoid the misleadingly clinically-sounding term “compulsion,” and I’d similarly prefer a word that doesn’t include the pejorative suffix “mania,” hence I’d rather just speak of “deterministic thinking” or “discrete thinking”—but I agree with Greenland’s general point that this tendency to prematurely collapse the wave function contributes to many problems in statistics and science.

Often when the problem of deterministic thinking comes up in discussion, I hear people explain it away, arguing that decisions have to be made (FDA drug trials are often brought up here), or that all rules are essentially deterministic (the idea that confidence intervals are interpreted as whether they include zero), or that this is a problem with incentives or publication bias, or that, sure, everyone knows that thinking of hypotheses as “true” or “false” is wrong, and that statistical significance and other summaries are just convenient shorthands for expressions of uncertainty that are well understood.

But I’d argue, with Eric Loken, that inappropriate discretization is not just a problem with statistical practice; it’s also a problem with how people think, that the idea of things being on or off is “actually the internal working model for a lot of otherwise smart scientists and researchers.”

This came up in some of the recent discussions on abandoning statistical significance, and I want to use this space to emphasize one more time the problem of inappropriate discrete modeling.

The issue arose in my 2011 paper, Causality and Statistical Learning:

More generally, anything that plausibly could have an effect will not have an effect that is exactly zero. I can respect that some social scientists find it useful to frame their research in terms of conditional independence and the testing of null effects, but I don’t generally find this approach helpful—and I certainly don’t believe that it is necessary to think in terms of conditional independence in order to study causality. Without structural zeros, it is impossible to identify graphical structural equation models.

The most common exceptions to this rule, as I see it, are independences from design (as in a designed or natural experiment) or effects that are zero based on a plausible scientific hypothesis (as might arise, e.g., in genetics, where genes on different chromosomes might have essentially independent effects), or in a study of ESP. In such settings I can see the value of testing a null hypothesis of zero effect, either for its own sake or to rule out the possibility of a conditional correlation that is supposed not to be there.

Another sort of exception to the “no true zeros” rule comes from in- formation restriction: a person’s decision should not be affected by knowledge that he or she does not have. For example, a consumer interested in buying apples cares about the total price he pays, not about how much of that goes to the seller and how much goes to the government in the form of taxes. So the restriction is that the utility depends on prices, not on the share of that going to taxes. That is the type of restriction that can help identify demand functions in economics.

I realize, however, that my perspective that there are no true zeros (information restrictions aside) is a minority view among social scientists and perhaps among people in general, on the evidence of [cognitive scientist Steven] Sloman’s book [Causal Models: How People Think About the World and Its Alternatives]. For example, from chapter 2: “A good politician will know who is motivated by greed and who is motivated by larger principles in order to discern how to solicit each one’s vote when it is needed” (p. 17). I can well believe that people think in this way but I don’t buy it: just about everyone is motivated by greed and by larger principles. This sort of discrete thinking doesn’t seem to me to be at all realistic about how people behave—although it might very well be a good model about how people characterize others.

In the next chapter, Sloman writes, “No matter how many times A and B occur together, mere co-occurrence cannot reveal whether A causes B, or B causes A, or something else causes both” (p. 25; emphasis added). Again, I am bothered by this sort of discrete thinking. I will return in a moment with an example, but just to speak generally, if A could cause B, and B could cause A, then I would think that, yes, they could cause each other. And if something else could cause them both, I imagine that could be happening along with the causation of A on B and of B on A.

To continue:

Let’s put this another way. Sloman’s book is called, “Causal Models: How People Think About the World and Its Alternatives,” and it makes sense to me that people think about the world discretely. My point, which I think aligns with those of Loken and Greenland, is that this discrete model of the world is typically inaccurate and misleading, so it’s worth fighting this tendency in ourselves. The point of the above example is that Sloman, who’s writing about how people think, is himself slipping into this error.

One more time

The above is just an example. My point is not to argue with Sloman—his book stands on its own, and his ideas can be valuable even if (or especially if!) his perspective on discrete modeling is different from mine. My point is that discrete thinking is not simply something people do because they have to make decisions, nor is it something that people do just because they have some incentive to take a stance of certainty. So when we’re talking about the problems of deterministic thinking, or premature collapse of the “wave function” of inferential uncertainty, we really are talking about a failure to incorporate enough of a continuous view of the world in our mental model.

P.S. Tomorrow’s post: I think that science is mostly “Brezhnevs.” It’s rare to see a “Gorbachev” who will abandon a paradigm just because it doesn’t do the job. Also, moving beyond naive falsificationism.

Continue Reading…


Read More

#FunDataFriday – The magick package in R

[This article was first published on #FunDataFriday - Little Miss Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Magick is a package available in R that allows you to very easily work with images and animations. Jeroen Ooms is the genius behind the package.


Example graph with an animation.  Produced using the magick package in R.

Example graph with an animation. Produced using the magick package in R.

It’s, awesome because it comes in handy every time that I want to work with images or animations within R. It allows for image and animation manipulation, creation and combination.

The best part is that it’s so easy! I’ve put minimal effort into understanding this package and I’ve been able to manipulate images and gifs, add them to my graphs and also create custom gifs and animated graphs.


The package vignette has a great primer which I used to get started. I have also posted a tutorial showing how to create the graph above. Try the examples. Next, try bringing an image or animation into a graph of your own. Or, try saving your data visualization series into an animated format!

To leave a comment for the author, please follow the link and comment on their blog: #FunDataFriday - Little Miss Data. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Should you use "dot notation" or "bracket notation" with pandas?

If you've ever used the pandas library in Python, you probably know that there are two ways to select a Series (meaning a column) from a DataFrame:

# dot notation

# bracket notation

Which method should you use? I'll make the case for each, and then you can decide...

Why use bracket notation?

The case for bracket notation is simple: It always works.

Here are the specific cases in which you must use bracket notation, because dot notation would fail:

# column name includes a space
df['col name']

# column name matches a DataFrame method

# column name matches a Python keyword

# column name is stored in a variable
var = 'col_name'

# column name is an integer

# new column is created through assignment
df['new'] = 0

In other words, bracket notation always works, whereas dot notation only works under certain circumstances. That's a pretty compelling case for bracket notation!

As stated in the Zen of Python:

There should be one-- and preferably only one --obvious way to do it.

Why use dot notation?

If you've watched any of my pandas videos, you may have noticed that I use dot notation. Here are four reasons why:

Reason 1: Dot notation is easier to type

Dot notation is three fewer characters to type than bracket notation. And in terms of finger movement, typing a single period is much more convenient than typing brackets and quotes.

This might sound like a trivial reason, but if you're selecting columns dozens (or hundreds) of times a day, it makes a real difference!

Reason 2: Dot notation is easier to read

Most of my pandas code is a made up of chains of selections and methods. By using dot notation, my code is mostly adorned with periods and parentheses (plus an occasional quotation mark):

# dot notation

If you instead use bracket notation, your code is adorned with periods and parentheses plus lots of brackets and quotation marks:

# bracket notation

I find the dot notation code easier to read, as well as more aesthetically pleasing.

Reason 3: Dot notation is easier to remember

With dot notation, every component in a chain is separated by a period on both sides. For example, this line of code has 4 components, and thus there are 3 periods separating the individual components:

# dot notation

If you instead use bracket notation, some of your components are separated by periods, and some are not:

# bracket notation

With bracket notation, I often forget whether there's supposed to be a period before ['col_one'], after ['col_one'], or both before and after ['col_one'].

With dot notation, it's easier for me to remember the correct syntax.

Reason 4: Dot notation limits the usage of brackets

Brackets can be used for many purposes in pandas:

df[['col_one', 'col_two']]
df.iloc[4, 2]
df.loc['row_label', 'col_one':'col_three']
df[(df.col_one > 5) & (df.col_two == 'value')]

If you also use bracket notation for Series selection, you end up with even more brackets in your code:

df[(df['col_one'] > 5) & (df['col_two'] == 'value')]

As you use more brackets, each bracket becomes slightly more ambiguous as to its purpose, imposing a higher mental burden on the person reading the code. By using dot notation for Series selection, you reduce bracket usage to only the essential cases.


If you prefer bracket notation, then you can use it all of the time! However, you still have to be familiar with dot notation in order to read other people's code.

If you prefer dot notation, then you can use it most of the time, as long as you are diligent about renaming columns when they contains spaces or collide with DataFrame methods. However, you still have to use bracket notation when creating new columns.

Which do you prefer? Let me know in the comments below, or vote in the Twitter poll:


There were some thoughtful comments about this issue on Twitter, mostly (but not exclusively) in favor of bracket notation:

Continue Reading…


Read More

The State of Transfer Learning in NLP

This post expands on the NAACL 2019 tutorial on Transfer Learning in NLP organized by Matthew Peters, Swabha Swayamdipta, Thomas Wolf, and Sebastian Ruder. This post highlights key insights and takeaways and provides updates based on recent work.

Continue Reading…


Read More

Taking a year to explain computer things

I’ve been working on explaining computer things I’m learning on this blog for 6 years. I wrote one of my first posts, what does a shell even do? on Sept 30, 2013. Since then, I’ve written 11 zines, 370,000 words on this blog, and given 20 or so talks. So it seems like I like explaining things a lot.

tl;dr: I’m going to work on explaining computer things for a year

Here’s the exciting news: I left my job a month ago and my plan is to spend the next year working on explaining computer things!

As for why I’m doing this – I was talking through some reasons with my friend Mat last night and he said “well, sometimes there are things you just feel compelled to do”. I think that’s all there is to it :)

what does “explain computer things” mean?

I’m planning to:

  1. write some more zines (maybe I can write 10 zines in a year? we’ll see! I want to tackle both general-interest and slightly more niche topics, we’ll see what happens).
  2. work on some more interactive ways to learn things. I learn things best by trying things out and breaking them, so I want to see if I can facilitate that a little bit for other people. I started a project around this in May which has been on the backburner for a bit but which I’m excited about. Hopefully I’ll release it soon and then you can try it out and tell me what you think!

I say “a year” because I think I have at least a year’s worth of ideas and I can’t predict how I’ll feel after doing this for a year.

how: run a business

I started a corporation almost exactly a year ago, and I’m planning to keep running my explaining-things efforts as a business. This business has been making more than I made in my first programming job (that is, definitely enough money to live on!), which has been really surprising and great (thank you!).

some parameters of the business:

  • I’m not planning to hire employees or anything, it’ll just be me and some (awesome) freelancers. The biggest change I have in mind is that I’m hoping to find a freelance editor to help me with editing.
  • I also don’t have any specific plans for world domination or to work 80-hour weeks. I’m just going to make zines & things that explain computer concepts and sell them on the internet, like I’ve been doing.
  • No commissions or consulting work, just building ideas I have

It’s been pretty interesting to learn more about running a small business and so far I like it more than I thought I would. (except for taxes, which I like exactly as much as I thought I would)

that’s all!

I’m excited to keep making explanations of computer things and to have more time to do it. This blog might change a bit away from “here’s what I’m learning at work these days” and towards “here are attempts at explaining things that I mostly already know”. It’ll be different! We’ll see how it goes!

Continue Reading…


Read More

Sleep Schedule, From the Inconsistent Teenage Years to Retirement

From the teenage years to college to adulthood through retirement, sleep is all over the place at first but then converges towards consistency. Read More

Continue Reading…


Read More

Reproducing the kidney cancer example from BDA

[This article was first published on R – Robin Ryder's blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is an attempt at reproducing the analysis of Section 2.7 of Bayesian Data Analysis, 3rd edition (Gelman et al.), on kidney cancer rates in the USA in the 1980s. I have done my best to clean the data from the original. Andrew wrote a blog post to “disillusion [us] about the reproducibility of textbook analysis”, in which he refers to this example. This might then be an attempt at reillusionment…

The cleaner data are on GitHub, as is the RMarkDown of this analysis.

d = read.csv("KidneyCancerClean.csv", skip=4)

In the data, the columns dc and dc.2 correspond (I think) to the death counts due to kidney cancer in each county of the USA, respectively in 1908-84 and 1985-89. The columns pop and pop.2 are some measure of the population in the counties. It is not clear to me what the other columns represent.

Simple model

Let n_j be the population on county j, and K_j the number of kidney cancer deaths in that county between 1980 and 1989. A simple model is K_j\sim Poisson(\theta_j n_j) where \theta_j is the unknown parameter of interest, representing the incidence of kidney cancer in that county. The maximum likelihood estimator is \hat\theta_j=\frac{K_j}{n_j}.

d$dct = d$dc + d$dc.2
d$popm = (d$pop + d$pop.2) / 2
d$thetahat = d$dct / d$popm

In particular, the original question is to understand these two maps, which show the counties in the first and last decile for kidney cancer deaths.

q = quantile(d$thetahat, c(.1, .9))
d$cancerlow = d$thetahat <= q[1] d$cancerhigh = d$thetahat >= q[2]
plot_usmap("counties", data=d, values="cancerhigh") +
  scale_fill_discrete(h.start = 200, 
                      name = "Large rate of kidney cancer deaths") 

plot of chunk unnamed-chunk-4

plot_usmap("counties", data=d, values="cancerlow") +
  scale_fill_discrete(h.start = 200, 
                      name = "Low rate of kidney cancer deaths") 

plot of chunk unnamed-chunk-4

These maps are suprising, because the counties with the highest kidney cancer death rate, and those with the lowest, are somewhat similar: mostly counties in the middle of the map.

(Also, note that the data for Alaska are missing. You can hide Alaska on the maps by adding the parameter include = statepop$full[-2] to calls to plot_usmap.)

The reason for this pattern (as explained in BDA3) is that these are counties with a low population. Indeed, a typical value for \hat\theta_j is around 0.0001. Take a county with a population of 1000. It is likely to have no kidney cancer deaths, giving \hat\theta_j=0 and putting it in the first decile. But if it happens to have a single death, the estimated rate jumps to \hat\theta_j=0.001 (10 times the average rate), putting it in the last decile.

This is hinted at in this histogram of the (\theta_j):

ggplot(data=d, aes(d$thetahat)) + 
  geom_histogram(bins=30, fill="lightblue") + 
  labs(x="Estimated kidney cancer death rate (maximum likelihood)", 
       y="Number of counties") +
  xlim(c(-1e-5, 5e-4))

plot of chunk unnamed-chunk-5

Bayesian approach

If you have ever followed a Bayesian modelling course, you are probably screaming that this calls for a hierarchical model. I agree (and I’m pretty sure the authors of BDA do as well), but here is a more basic Bayesian approach. Take a common \Gamma(\alpha, \beta) distribution for all the (\theta_j); I’ll go for \alpha=15 and \beta = 200\ 000, which is slightly vaguer than the prior used in BDA. Obviously, you should try various values of the prior parameters to check their influence.

The prior is conjugate, so the posterior is \theta_j|K_j \sim \Gamma(\alpha + K_j, \beta + n_j). For small counties, the posterior will be extremely close to the prior; for larger counties, the likelihood will take over.

It is usually a shame to use only point estimates, but here it will be sufficient: let us compute the posterior mean of \theta_j. Because the prior has a strong impact on counties with low population, the histogram looks very different:

alpha = 15
beta = 2e5
d$thetabayes = (alpha + d$dct) / (beta + d$pop)
ggplot(data=d, aes(d$thetabayes)) + 
  geom_histogram(bins=30, fill="lightblue") + 
  labs(x="Estimated kidney cancer death rate (posterior mean)", 
       y="Number of counties") +
  xlim(c(-1e-5, 5e-4))

plot of chunk unnamed-chunk-6

And the maps of counties in the first and last decile are now much easier to distinguish; for instance, Florida and New England are heavily represented in the last decile. The counties represented here are mostly populated counties: these are counties for which we have reason to believe that they are on the lower or higher end for kidney cancer death rates.

qb = quantile(d$thetabayes, c(.1, .9))
d$bayeslow = d$thetabayes <= qb[1] d$bayeshigh = d$thetabayes >= qb[2]
plot_usmap("counties", data=d, values="bayeslow") +
    h.start = 200, 
    name = "Low kidney cancer death rate (Bayesian inference)")  

plot of chunk unnamed-chunk-7

plot_usmap("counties", data=d, values="bayeshigh") +
    h.start = 200, 
    name = "High kidney cancer death rate (Bayesian inference)")  

plot of chunk unnamed-chunk-7

An important caveat: I am not an expert on cancer rates (and I expect some of the vocabulary I used is ill-chosen), nor do I claim that the data here are correct (from what I understand, many adjustments need to be made, but they are not detailed in BDA, which explains why the maps are slightly different). I am merely posting this as a reproducible example where the naïve frequentist and Bayesian estimators differ appreciably, because they handle sample size in different ways. I have found this example to be useful in introductory Bayesian courses, as the difference is easy to grasp for students who are new to Bayesian inference.

To leave a comment for the author, please follow the link and comment on their blog: R – Robin Ryder's blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

If you did not already know

semopy google
Structural equation modelling (SEM) is a multivariate statistical technique for estimating complex relationships between observed and latent variables. Although numerous SEM packages exist, each of them has limitations. Some packages are not free or open-source; the most popular package not having this disadvantage is $\textbf{lavaan}$, but it is written in R language, which is behind current mainstream tendencies that make it harder to be incorporated into developmental pipelines (i.e. bioinformatical ones). Thus we developed the Python package $\textbf{semopy}$ to satisfy those criteria. The paper provides detailed examples of package usage and explains it’s inner clockworks. Moreover, we developed the unique generator of SEM models to extensively test SEM packages and demonstrated that $\textbf{semopy}$ significantly outperforms $\textbf{lavaan}$ in execution time and accuracy. …

Causaltoolbox google
Estimating heterogeneous treatment effects has become extremely important in many fields and often life changing decisions for individuals are based on these estimates, for example choosing a medical treatment for a patient. In the recent years, a variety of techniques for estimating heterogeneous treatment effects, each making subtly different assumptions, have been suggested. Unfortunately, there are no compelling approaches that allow identification of the procedure that has assumptions that hew closest to the process generating the data set under study and researchers often select just one estimator. This approach risks making inferences based on incorrect assumptions and gives the experimenter too much scope for p-hacking. A single estimator will also tend to overlook patterns other estimators would have picked up. We believe that the conclusion of many published papers might change had a different estimator been chosen and we suggest that practitioners should evaluate many estimators and assess their similarity when investigating heterogeneous treatment effects. We demonstrate this by applying 32 different estimation procedures to an emulated observational data set; this analysis shows that different estimation procedures may give starkly different estimates. We also provide an extensible \texttt{R} package which makes it straightforward for practitioners to apply our analysis to their data. …

Gradient Episodic Memory google
One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art. …

AlignFlow google
Given unpaired data from multiple domains, a key challenge is to efficiently exploit these data sources for modeling a target domain. Variants of this problem have been studied in many contexts, such as cross-domain translation and domain adaptation. We propose AlignFlow, a generative modeling framework for learning from multiple domains via normalizing flows. The use of normalizing flows in AlignFlow allows for a) flexibility in specifying learning objectives via adversarial training, maximum likelihood estimation, or a hybrid of the two methods; and b) exact inference of the shared latent factors across domains at test time. We derive theoretical results for the conditions under which AlignFlow guarantees marginal consistency for the different learning objectives. Furthermore, we show that AlignFlow guarantees exact cycle consistency in mapping datapoints from one domain to another. Empirically, AlignFlow can be used for data-efficient density estimation given multiple data sources and shows significant improvements over relevant baselines on unsupervised domain adaptation. …

Continue Reading…


Read More

Magister Dixit

“Portability of code and environment is one of the challenge every data scientist faces. The code can be framework dependent or it can be machine dependent. The end result – A model that works like a charm on one machine might not do so on another.” AVBytes ( 23.04.2018 )

Continue Reading…


Read More

Getting on the meet-up bandwagon – our first meet up event

[This article was first published on R Blogs – Hutsons-hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My company Draper and Dash have tasked me with organising a wider meet-up event for anyone who is interested in AI / ML in healthcare. This wider working group consists of people from different sectors, however they are interested in how we can apply AI / ML methods in their organisations.

Why did we choose meet-up

Meet-up seems like a great way to engage like minded people from across different communities.

What is our first event

Our first event is detailed below:

Using Supervised ML Methods to augment intelligent dashboards in Healthcare

Wednesday, Nov 6, 2019, 7:00 PM

One Canada Square
London, GB

13 Members Attending

This session will focus on how we use ML to augment our BI dashboards. The agenda will be updated shortly.

Check out this Meetup →

This will focus on how we use supervised ML methods in our BI solutions and this adds to our BI stack.

Join now

So, this community is gathering members – we have over 90 now. If you are interested, and are based in the London, UK area, then please follow our events – as we plan to have a whole lot more in the future – with guest speakers as well.

To leave a comment for the author, please follow the link and comment on their blog: R Blogs – Hutsons-hacks. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Document worth reading: “Deconstructing Blockchains: A Comprehensive Survey on Consensus, Membership and Structure”

It is no exaggeration to say that since the introduction of Bitcoin, blockchains have become a disruptive technology that has shaken the world. However, the rising popularity of the paradigm has led to a flurry of proposals addressing variations and/or trying to solve problems stemming from the initial specification. This added considerable complexity to the current blockchain ecosystems, amplified by the absence of detail in many accompanying blockchain whitepapers. Through this paper, we set out to explain blockchains in a simple way, taming that complexity through the deconstruction of the blockchain into three simple, critical components common to all known systems: membership selection, consensus mechanism and structure. We propose an evaluation framework with insight into system models, desired properties and analysis criteria, using the decoupled components as criteria. We use this framework to provide clear and intuitive overviews of the design principles behind the analyzed systems and the properties achieved. We hope our effort will help clarifying the current state of blockchain proposals and provide directions to the analysis of future proposals. Deconstructing Blockchains: A Comprehensive Survey on Consensus, Membership and Structure

Continue Reading…


Read More

Regex Problem? Here’s an R package that will write Regex for you

[This article was first published on r-bloggers on Programming with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

REGEX is that thing that scares everyone almost all the time. Hence, finding some alternative is always very helpful and peaceful too. Here’s a nice R package thst helps us do REGEX without knowing REGEX.


This is the REGEX pattern to test the validity of a URL:

^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$

A typical regular expression contains — Characters ( http ) and Meta Characters ([]). The combination of these two form a meaningful regular expression for a particular task.
So, What’s the problem?

Remembering the way in which characters and meta-characters are combined to create a meaningful regex is itself a tedious task which sometimes becomes a bigger task than the actual problem of NLP which is the larger goal.

Solution at Hand

Some good soul on this planet has created an open-source Javascript library JSVerbalExpressions to make Regex creation easy. Then some other good soul (Tyler Littlefield) ported the javascript library to R— RVerbalExpressions. This is the beauty of the open source world.


is available on RVerbalExpressions Github so you can use devtools or remotes to install it from Github.

# install.packages("devtools")


Let’s create a pseudo-problem that we’d like to solve with regex through which we can understand this package to programmatically create regex.

A simpler one perhaps, We’ve got multiple text like and we’d like to extract the names from it. Here’s our input and output look like:

strings = c('123Problem233','233Solution434','223Tim Apple444')

Problem, Solution, Time Apple

Once we solve this, we’ll move forward with slightly complicated problems.


Before we code, it’s always good to write-out a pseudo-code on a napkin or even a paper if you’ve got. That is, We want to extract names (which is composition of alphabets) except numbers (which is digits). We build a regex for one-line and then we iterate it for all the elements in our vector.


Like any other R package, we can load RVerbalExpressions with library() function.


Constructing the Expression

Extract Strings

Like many other modern-day R packages, RVerbalExpressions support %>% pipe operator for better simplicity and readability of the code. But for this problem of extracting strings that are present between the numbers, we can simply use one function that is rx_alpha() to say that we need alphabets from the given string.

expr =  rx_alpha()


[1] "P" "r" "o" "b" "l" "e" "m"

[1] "S" "o" "l" "u" "t" "i" "o" "n"

[1] "T" "i" "m" "A" "p" "p" "l" "e"*

Extract Numbers

Similar to the text that we extracted, Extracting Numbers again is very English as we’ve to use the function rx_digit() to say that we need numbers from the given text.

expr =  rx_digit()

[1] "1" "2" "3" "2" "3" "3"

[1] "2" "3" "3" "4" "3" "4"

[1] "2" "2" "3" "4" "4" "4"

Another Constructor to extract the name as a word

Here, we can use the function rx_word() to match it as word (rather than letters).

expr =  rx_alpha()  %>%  rx_word() %>% rx_alpha()


[1] "Problem"

[1] "Solution"

[1] "Tim" "Apple"


What if we want to use the expression somewhere else or simply we need the regex expression. It’s simple because the expression is what we’ve constructed and printing what we constructed would reveal the relevant regex pattern.




Thus, we managed to build a regex pattern without knowing regex. Simply put, we programmatically generated a regex pattern using R (that doesn’t require the high-level knowledge of regex patterns) and accomplished a tiny task that we took up to demonstrate the potential. For more of Regex, Check out this course. The entire code is available here.

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on Programming with R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Laguerre-Samuelson Inequality

[This article was first published on LeaRning Stats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Chebychev’s Theorem gives bounds on how spread out a probability distribution can be from the mean, in terms of the standard deviation. More precisely, if \(X\) is a random variable with mean \(\mu\) and standard deviation \(\sigma\), then \[
P(|X – \mu| \ge k \sigma) \le \frac {1}{k^2}.

When I was an undergraduate at Texas, I noticed that this has the following implication. Suppose \(x_1,\ldots,x_n\) is any finite set of numbers. Create a random variable \(X\) so that \(P(X = x_i) = 1/n\) for all \(1\le i \le n\). Then, the mean of \(X\) is \(\mu = \overline{x}\), and the standard deviation of \(X\) is \(s_X = s \sqrt{\frac{n-1}{n}}\), where \(s\) is the sample distribution. By Chebychev’s Theorem with \(k = n\), \[P(|X – \overline{x}| \ge \sqrt{n} s_X) \le \frac{1}{n}\] Now, since the probability of each \(x_i\) is \(\frac 1n\), this tells us that all of the points \(x_i\) are within \(\sqrt{n} s_X\) if the mean of the \(x_i\).

I totally didn’t believe that was true when I was 20 years old, so I spent (way too much) time computing examples. What I noticed was that not only did it seem to be true, but I couldn’t even find examples where \(\sqrt{n} s_X\) was even necessary.

Fast forward 30 years later, and I decided to revisit this problem. Let’s consider the case when \(n = 4\) and do some simulations. I will take a bunch of random samples of size 4, compute \(\mu\) and \(s_X\) and the maximum distance from \(\mu\) to any of the \(x_i\). Then, we’ll compare it to \(s_X\) to see how far the farthest point away from the mean is.

vec <- 1:4
probs <- rep(1/4, 4)
mu <- mean(vec)
sd <- sqrt(sum(probs * (vec - mu)^2))
best <- max(abs(vec - mu))/sd
best_vec <- vec
for(i in 1:100000) {
  vec <- rnorm(4)
  mu <- mean(vec)
  sd <- sd(vec) * sqrt(3/4)
  if(max(abs(vec - mu))/sd > best) {
    best <- max(abs(vec - mu))/sd
    best_vec <- vec
## [1] -0.1465015 -0.1494259  1.5503518 -0.1461181
## [1] 1.732048

We see a couple of things. First, we see that the extreme case is when all of the values are equal except for one which is different, and second, that all of the data is within 1.732 standard deviations of the mean, when Chebychev would’ve predicted that all of the data is within 2 standard deviations of the mean. What gives? Laguerre-Samuelson, that’s what!

Theorem Let \(x_1,\ldots,x_n\) be any \(n\) real numbers with mean \(\mu\) and standard deviation \(s\) (with \(n\) in the downstairs). Then, all \(n\) numbers lie within the interval \([\mu – \sqrt{n-1}s, \mu + \sqrt{n-1}s]\).

As hinted at above, this theorem can be considered a sharp version of Chebychev in the case that the rv \(X\) consists of finitely many equally likely values.

To leave a comment for the author, please follow the link and comment on their blog: LeaRning Stats. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Fitting ‘complex’ mixed models with ‘nlme’: Example #2

[This article was first published on R on The broken bridge between biologists and statisticians, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A repeated split-plot experiment with heteroscedastic errors

Let’s imagine a field experiment, where different genotypes of khorasan wheat are to be compared under different nitrogen (N) fertilisation systems. Genotypes require bigger plots, with respect to fertilisation treatments and, therefore, the most convenient choice would be to lay-out the experiment as a split-plot, in a randomised complete block design. Genotypes would be randomly allocated to main plots, while fertilisation systems would be randomly allocated to sub-plots. As usual in agricultural research, the experiment should be repeated in different years, in order to explore the environmental variability of results.

What could we expect from such an experiment?

Please, look at the dataset ‘kamut.csv’, which is available on github. It provides the results for a split-plot experiment with 15 genotypes and 2 N fertilisation treatments, laid-out in three blocks and repeated in four years (360 observations, in all).

The dataset has five columns, the ‘Year’, the ‘Genotype’, the fertilisation level (‘N’), the ‘Block’ and the response variable, i.e. ‘Yield’. The fifteen genotypes are coded by using the letters from A to O, while the levels of the other independent variables are coded by using numbers. The following snippets loads the file and recodes the numerical independent variables into factors.

dataset <- read.csv("", header = T)
dataset$Block <- factor(dataset$Block)
dataset$Year <- factor(dataset$Year)
dataset$N <- factor(dataset$N)
##   Year Genotype N Block Yield
## 1 2004        A 1     1 2.235
## 2 2004        A 1     2 2.605
## 3 2004        A 1     3 2.323
## 4 2004        A 2     1 3.766
## 5 2004        A 2     2 4.094
## 6 2004        A 2     3 3.902

Additionally, it may be useful to code some ‘helper’ factors, to represent the blocks (within years) and the main-plots. The first factors (‘YearBlock’) has 12 levels (4 years and 3 blocks per year) and the second factor (‘MainPlot’) has 180 levels (4 years, 3 blocks per year and 15 genotypes per block).

dataset$YearBlock <- with(dataset, factor(Year:Block))
dataset$MainPlot <- with(dataset, factor(Year:Block:Genotype))

For the analyses, we will make use of the ‘plyr’ (Wickham, 2011), ‘car’ (Fox and Weisberg, 2011) and ‘nlme’ (Pinheiro et al., 2018) packages, which we load now.


It is always useful to start by separately considering the results for each year. This gives us a feel for what happened in all experiments. What model do we have to fit to single-year split-plot data? In order to avoid mathematical notation, I will follow the notation proposed by Piepho (2003), by using the names of variables, as reported in the dataset. The treatment model for this split-plot design is:

Yield ~ Genotype * N

All treatment effects are fixed. The block model, referencing all grouping structures, is:

Yield ~ Block + Block:MainPlot + Block:MainPlot:Subplot

The first element references the blocks, while the second element references the main-plots, to which the genotypes are randomly allocated (randomisation unit). The third element references the sub-plots, to which N treatments are randomly allocated (another randomisation unit); this latter element corresponds to the residual error and, therefore, it is fitted by default and needs not be explicitly included in the model. Main-plot and sub-plot effects need to be random, as they reference randomisation units (Piepho, 2003). The nature of the block effect is still under debate (Dixon, 2016), but I’ll take it as random (do not worry: I will also show how we can take it as fixed).

Coding a split-plot model in ‘lme’ is rather simple:

lme(Yield ~ Genotype * N, random = ~1|Block/MainPlot

where the notation ‘Block/MainPlot’ is totally equivalent to ‘Block + Block:MainPlot’. Instead of manually fitting this model four times (one per year), we can ask R to do so by using the ‘ddply()’ function in the ‘plyr’ package. In the code below, I used this technique to retrieve the residual variance for each experiment.

lmmFits <- ddply(dataset, c("Year"),
      function(df) summary( lme(Yield ~ Genotype * N,
                 random = ~1|Block/MainPlot,
                 data = df))$sigma^2 )
##   Year          V1
## 1 2004 0.052761644
## 2 2005 0.001423833
## 3 2006 0.776028791
## 4 2007 0.817594477

We see great differences! The residual variance in 2005 is more that 500 times smaller than that observed in 2007. Clearly, if we pool the data and make an ANOVA, when we pool the data, we violate the homoscedasticity assumption. In general, this problem has an obvious solution: we can model the variance-covariance matrix of observations, allowing a different variance per year. In R, this is only possible by using the ‘lme()’ function (unless we want to use the ‘asreml-R’ package, which is not freeware, unfortunately). The question is: how do we code such a model?

First of all, let’s derive a correct mixed model. The treatment model is:

Yield ~ Genotype * N

We have mentioned that the genotype and N effects are likely to be taken as fixed. The block model is:

 ~ Year + Year/Block + Year:Block:MainPlot + Year:Block:MainPlot:Subplot

The second element in the block model references the blocks within years, the second element references the main-plots, while the third element references the sub-plots and, as before, it is not needed. The year effect is likely to interact with both the treatment effects, so we need to add the following effects:

 ~ Year + Year:Genotype + Year:N + Year:Genotype:N

which is equivalent to writing:

 ~ Year*Genotype*N

The year effect can be taken as either as random or as fixed. In this post, we will show both approaches

Year effect is fixed

If we take the year effect as fixed and the block effect as random, we see that the random effects are nested (blocks within years and main-plots within blocks and within years). The function ‘lme()’ is specifically tailored to deal with nested random effects and, therefore, fitting the above model is rather easy. In the first snippet we fit a homoscedastic model:

modMix1 <- lme(Yield ~ Year * Genotype * N,
                 random = ~1|YearBlock/MainPlot,
                 data = dataset)

We could also fit this model with the ‘lme4’ package and the ‘lmer()’; however, we are not happy with this, because we have seen clear signs of heteroscedastic within-year errors. Thus, let’s account for such an heteroscedasticity, by using the ‘weights()’ argument and the ‘varIdent()’ variance structure:

modMix2 <- lme(Yield ~ Year * Genotype * N,
                 random = ~1|YearBlock/MainPlot,
                 data = dataset,
               weights = varIdent(form = ~1|Year))
AIC(modMix1, modMix2)
##          df      AIC
## modMix1 123 856.6704
## modMix2 126 575.1967

Based on the Akaike Information Criterion, we see that the second model is better than the first one, which supports the idea of heteroscedastic residuals. From this moment on, the analyses proceeds as usual, e.g. by testing for fixed effects and comparing means, as necessary. Just a few words about testing for fixed effects: Wald F tests can be obtained by using the ‘anova()’ function, although I usually avoid this with ‘lme’ objects, as there is no reliable approximation to degrees of freedom. With ‘lme’ objects, I suggest using the ‘Anova()’ function in the ‘car’ package, which shows the results of Wald chi square tests.

## Analysis of Deviance Table (Type II tests)
## Response: Yield
##                    Chisq Df Pr(>Chisq)    
## Year              51.072  3  4.722e-11 ***
## Genotype         543.499 14  < 2.2e-16 ***
## N               2289.523  1  < 2.2e-16 ***
## Year:Genotype    123.847 42  5.281e-10 ***
## Year:N            21.695  3  7.549e-05 ***
## Genotype:N      1356.179 14  < 2.2e-16 ***
## Year:Genotype:N  224.477 42  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

One further aspect: do you prefer fixed blocks? Then you can fit the following model.

modMix4 <- lme(Yield ~ Year * Genotype * N + Year:Block,
                 random = ~1|MainPlot,
                 data = dataset,
               weights = varIdent(form = ~1|Year))

Year effect is random

If we’d rather take the year effect as random, all the interactions therein are random as well (Year:Genotype, Year:N and Year:Genotype:N). Similarly, the block (within years) effect needs to be random. Therefore, we have several crossed random effects, which are not straightforward to code with ‘lme()’. First, I will show the code, second, I will comment it.

modMix5 <- lme(Yield ~ Genotype * N,
                  random = list(Year = pdIdent(~1),
                                Year = pdIdent(~Block - 1),
                                Year = pdIdent(~MainPlot - 1),
                                Year = pdIdent(~Genotype - 1),
                                Year = pdIdent(~N - 1),
                                Genotype = pdIdent(~N - 1)),
               weights = varIdent(form = ~1|Year))

We see that random effects are coded using a named list; each component of this list is a pdMat object with name equal to a grouping factor. For example, the component ‘Year = pdIdent(~ 1)’ represents a random year effect, while ‘Year = pdIdent(~ Block – 1)’ represents a random year effect for each level of Block, i.e. a random ‘year x block’ interaction. This latter variance component is the same for all blocks (‘varIdent’), i.e. there is homoscedastic at this level.

It is important to remember that the grouping factors in the list are treated as nested; however, the grouping factor is only one (‘Year’), so that the nesting is irrelevant. The only exception is the genotype, which is regarded as nested within the year. As the consequence, the component ‘Genotype = pdIdent(~N – 1)’, specifies a random year:genotype effect for each level of N treatment, i.e. a random year:genotype:N interaction.

I agree, this is not straightforward to understand! If necessary, take a look at the good book of Gałecki and Burzykowski (2013). When fitting the above model, be patient; convergence may take a few seconds. I’d only like to reinforce the idea that, in case you need to test for fixed effects, you should not rely on the ‘anova()’ function, but you should prefer Wald chi square tests in the ‘Anova()’ function in the ‘car’ package.

Anova(modMix5, type = 2)
## Analysis of Deviance Table (Type II tests)
## Response: Yield
##              Chisq Df Pr(>Chisq)    
## Genotype   68.6430 14  3.395e-09 ***
## N           2.4682  1     0.1162    
## Genotype:N 14.1153 14     0.4412    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Another note: coding random effects as a named list is always possible. For example ‘modMix2’ can also be coded as:

modMix2b <- lme(Yield ~ Year * Genotype * N,
                 random = list(YearBlock = ~ 1, MainPlot = ~ 1),
                 data = dataset,
               weights = varIdent(form = ~1|Year))

Or, also as:

modMix2c <- lme(Yield ~ Year * Genotype * N,
                 random = list(YearBlock = pdIdent(~ 1), MainPlot = pdIdent(~ 1)),
                 data = dataset,
               weights = varIdent(form = ~1|Year))

Hope this is useful! Have fun with it.

Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)


  1. Fox J. and Weisberg S. (2011). An {R} Companion to Applied Regression, Second Edition. Thousand Oaks CA: Sage. URL:
  2. Gałecki, A., Burzykowski, T., 2013. Linear mixed-effects models using R: a step-by-step approach. Springer, Berlin.
  3. Piepho, H.-P., Büchse, A., Emrich, K., 2003. A Hitchhiker’s Guide to Mixed Models for Randomized Experiments. Journal of Agronomy and Crop Science 189, 310–322.
  4. Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2018). nlme: Linear and Nonlinear Mixed Effects Models_. R package version 3.1-137,>.
  5. Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. URL:

To leave a comment for the author, please follow the link and comment on their blog: R on The broken bridge between biologists and statisticians. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

In Britain, command of a foreign language is still à la mode

But a decline in languages in schools and limits on migration stifle supply

Continue Reading…


Read More

September 12, 2019

Social Network Visualization with R

[This article was first published on R programming – Journey of Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this month’s we are going to look at data analysis and visualization of social networks using R programming.

Social Networks – Data Visualization

Friendster Networks Mapping

Friendster was a yesteryear social media network, something akin to Facebook. I’ve never used it but it is one of those easily available datasets where you have a list of users and all their connections. So it is easy to create a viz and look at whose networks are strong and whose are weak, or even the bridge between multiple networks.

The dataset and code files are added on the Projects Page here , under “social network viz”.

For this analysis, we will be using the following library packages:

  • visNetwork
  • geomnet
  • igraph


  1. Load the datafiles. The list of users is given in the file named “nodes” as each user is a node in the graph. The connection list is given in the file named “edges” as a 1-to-1 mapping. So if user Miranda has 10 friends, there would be 10 records for Miranda in the “edges” file, one for each friend. The friendster datafile has been anonymized, so there are numbers (id) rather than names.
  2. Convert the dataframes into a very specific format. We do some prepwork so that we can directly use the graph visualization functions.
  3. Create a graph object. This will also help to create clusters. Since the dataset is anonymized it might seem irrelevant, but imagine this in your own social network. You might have one cluster of friends who are from your school, another bunch from your office, one set who are cousins and family members and some random folks. Creating a graph object allows us to look at where those clusters lie automatically.
  4. Visualize using functions specific to graph objects. The first function is visNetwork() which generates an interactive color coded cluster graph. When you click on any of th nodes (colored circles), it will highlight all the connections radiating from the node. (In the image below, I have highlighted node for user 17. nwk-viz-highlight
  5. You can also use the same function with a bunch of different parameters, as shown below:
visNetwork(nodeset, links2, width = "100%") %>%
    visIgraphLayout() %>%
        shape = "dot",
        color = list(
            background = "#0085AF",
            border = "#013848",
            highlight = "#FF8000"
        shadow = list(enabled = TRUE, size = 10)
    ) %>%
        shadow = FALSE,
        color = list(color = "#0085AF", highlight = "#C62F4B")
    ) %>%
    visOptions(highlightNearest = list(enabled = T, degree = 1, hover = T)) %>% 
    visLayout(randomSeed = 11)

In the image below you can see the 3 colored clusters and the central (light blue) node. The connections in blue are the ones that do not have a lot of direct connections. The yellow and red clusters are tigher, indicating they have internal connections with each other. (similar to a bunch of classmates who all know each other)

network clustersnetwork clusters

That’s it. Again the code is available on the Projects Page.

Code Extensions

Feel free to play around with the code. One extensions of this idea would be to download Facebook or LinkedIn data (premium account needed) and create similar visualizations.

Or if you have a list of airports and routes, you could create something like this as a flight network map, to know the minimum number of hops between 2 destinations and alternative routes.

You could also do a counter to see which nodes have the most number of friends and increase the size of the circle. This would make it easier to view which nodes are the most well-connected.

Of course, do not be over-mesmerized by the data. In real-life, the strength of the relationship also matters. This is hard to quantify or collect, even though its easy to depict once you have the data in hand/ For example, I have a 1000 connections who I’ve met at conferences or random events. If I needed a job, most may not really be useful. But my friend Sarah has only 300 but super-loyal friends who literally found her a job in 2 days when she had to move back to her hometown to take care of a sick parent.

With that thought, do take a look at the code and have fun coding! 🙂

The post Social Network Visualization with R appeared first on Journey of Analytics.

To leave a comment for the author, please follow the link and comment on their blog: R programming – Journey of Analytics. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue Reading…


Read More

Distilled News

Building a Backend System for Artificial Intelligence

Let’s explore the challenges involved in building a backend system to store and retrieve high-dimensional data vectors, typical to modern systems that use ‘artificial intelligence’ – image recognition, text comprehension, document search, music recommendations, …

Automatic GPUs

A reproducible R / Python approach to getting up and running quickly on GCloud with GPUs in Tensorflow.

Classification algorithm for non-time series data

One of the critical problems of ‘identification’, be it NLP – speech/text or solving an image puzzle from pieces like a jigsaw, is to understand the words, or pieces of data and the context. The words or pieces individually don’t give any meaning and tying them together gives an idea about the context. Now the data itself has some patterns, which is broadly classified as sequential or time-series data and non-time series data, which is largely non-sequential or arbitrary. Sentiment analysis of text reports, documents and journals, novels & classics follow time series pattern, in the sense, the words itself follow a precedence as governed by the grammar and the language dictionary. So are the stock-price prediction problems which has a precedent of the previous time period predictions and socio-economic conditions.

Calendar Heatmaps in ggplot

Calendar heatmaps are a neglected, but valuable, way of representing time series data. Their chief advantage is in allowing the viewer to visually process trends in categorical or continuous data over a period of time, while relating these values to their month, week, and weekday context – something that simple line plots do not efficiently allow for. If you are displaying data on staffing levels, stock returns (as we will do here), on-time performance for transit systems, or any other one dimensional data, a calendar heatmap can do wonders for helping your stakeholders note patterns in the interaction between those variables and their calendar context. In this post, I will use stock data in the form of daily closing prices for the SPY – SPDR S&P 500 ETF, the most popular exchange traded fund in the world. ETF’s are growing in popularity, so much so that there’s even a podcast devoted entirely to them. For the purposes of this blog post, it’s not necessary to have any familiarity with ETF’s or stocks in general. Some knowledge of tidyverse packages and basic R will be helpful, though.

Getting Machine Learning Models Ready For Production

As a Scientist, it’s incredibly satisfying to be given the freedom to experiment by applying new research and rapidly prototyping. This satisfaction can be sustained quite well in a lab environment but can diminish quickly in a corporate environment. This is because of the underlying commercial value motive which science is driven by in a business setting – if it doesn’t add business value to employees or customers, there’s no place for it! Business value, however, goes beyond just being a nifty experiment which shows potential value to employees or customers. In the context of Machine Learning models, the only [business] valuable models, are models in Production! In this blog post, I will take you through the journey which my team and I went through in taking Machine Learning models to Production and some important lessons learnt along the way.

Adversarial Examples – Rethinking the Definition

Adversarial examples are a large obstacle for a variety of machine learning systems to overcome. Their existence shows the tendency of models to rely on unreliable features to maximize performance, which if perturbed, can cause misclassifications with potentially catastrophic consequences. The informal definition of an adversarial example is an input that has been modified in a way that is imperceptible to humans, but is misclassified by a machine learning system whereas the original input was correctly classified.

Data Science is Boring (Part 1)

My boring days of deploying Machine Learning and how I cope.

Parsing Text for Emotion Terms: Analysis & Visualization Using R: Updated Analysis

The motivation for an updated analysis: The first publication of Parsing text for emotion terms: analysis & visualization Using R published in May 2017 used the function get_sentiments(‘nrc’) that was made available in the tidytext package. Very recently, the nrc lexicon was dropped from the tidytext package and hence the R codes in the original publication failed to run. The NRC emotion terms are also available in the lexicon package.

R Neural Network

In the previous four posts I have used multiple linear regression, decision trees, random forest, gradient boosting, and support vector machine to predict MPG for 2019 vehicles. It was determined that svm produced the best model. In this post I am going to use the neuralnet package to fit a neural network to the cars_19 dataset.

Kubernetes: A simple overview

This overview covers the basics of Kubernetes: what it is and what you need to keep in mind before applying it within your organization. The information in this piece is curated from material available on the O’Reilly online learning platform and from interviews with Kubernetes experts.

Quickly understanding process mining by analyzing event logs with Celonis Snap

Data is the new oil.’, ‘Our company needs to become more efficient.’, ‘Can we optimize this process?’, ‘Our processes are too complicated.’ – sentences you have heard very often and maybe cannot hear anymore. It is understandable but there are some actual real world benefits that stem from the technologies and discussions behind the super trend of (Big) Data. One of the emerging technologies in this field is in more ways than one directly linked to the sentences above. It is process mining. Maybe you have heard of it. Maybe you have not. Harvard Business Review thinks ‘[…] you should be exploring process mining’.

Fine-grained Sentiment Analysis (Part 3): Fine-tuning Transformers

Hands-on transfer learning using a pretrained transformer in PyTorch. This is Part 3 of a series on fine-grained sentiment analysis in Python. Parts 1 and 2 covered the analysis and explanation of six different classification methods on the Stanford Sentiment Treebank fine-grained (SST-5) dataset. In this post, we’ll look at how to improve on past results by building a transformer-based model and applying transfer learning, a powerful method that has been dominating NLP task leaderboards lately.

Industrializing AI & Machine Learning Applications with Kubeflow

Enable data scientists to make scaling and production-ready ML products.

Building and Labeling Image Datasets for Data Science Projects

Using standardized datasets is great for benchmarking new models/pipelines or for competitions. But for me at least a lot of fun of data science comes when you get to apply things to a project of your own choosing. One of the key parts of this process is building a dataset. So there are a lot of ways to build image datasets. For certain things I have legitimately just taken screenshots like when I was sick and built a facial recognition dataset using season 4 of the Flash and annotated it with labelimg. Another route I have taken is downloading a bunch of images by hand and just display images and label them in an excel spreadsheet… For certain projects you might just have to take a bunch of pictures with your phone as was the case when I made my dice counter. These days I have figured out a few more tricks which make this processes a bit easier and am working on improving things along the way.

Introducing IceCAPS: Microsoft’s Framework for Advanced Conversation Modeling

The new open source framework that brings multi-task learning to conversational agents. Neural conversation systems and disciplines such as natural language processing(NLP) have seen significant advancements over the last few years. However, most of the current NLP stacks are designed for simple dialogs based on one or two sentences. Structuring more sophisticated conversations that factor in aspects such as personalities or context remains an open challenge. Recently, Microsoft Research unveiled IceCAPS, an open source framework for advanced conversation modeling.

Continue Reading…


Read More

What’s going on on PyPI

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

Side Channel Attack Assisted with Machine Learning

Use Scikit Learn models in Flutter. Easily transpile scikit-learn models to native Dart code aimed at Flutter. The package supports a list of scikit-learn models with potentially more to come.

Slalom GGP libary for DataOps automation

Watson TTS Implementation. This Module is designed to convert Text to Speech format. It will generate Wav file for any Text-String passed to the module.


ML classifier package

Enterprise Machine-Learning and Predictive Analytics. Vivid Code is a pioneering software framework for next generation data analysis applications, that interconnects collaborative data science with automated machine learning. Based on the **Cloud-Assisted Meta programming** (CAMP) paradigm, the framework allows the usage of Currently Best Fitting (CBF) algorithms. Before code interpretation / compilation the concrete algorithms, that implement the CBF specifications, are automatically chosen from local and public catalog servers, that host and deploy the concrete algorithms. Thereby the specification is constituted by a unique algorithm category, a data domain and a metric, which substantiates the meaning of *Best Fitting* within the respective algorithm- and data context. An example is the average prediction accuracy within a fixed set of gold standard samples of the data domain (e.g. latin handwriting samples, spoken word samples, TCGA gene expression data, etc.).

Alyeska /al-ee-EHS-kah/ n. A Data Pipeline Toolkit

中文语义理解服务 Python SDK

Python package to explore the color of language. compsyn is a package which provides a novel methodlogy to explore relationships between words and abstract concepts through color. The work rose through a collaboration between the contributors at the Santa Fe Institute’s Complex System Summer School 2019.

deep learning framework from zero

Kubeflow Fairing Python SDK. Python SDK for Kubeflow Fairing components.

Markdown to Jupyter Notebook converter.

Python client for ML Pipelines. Python client for the BitGN Machine Learning Pipelines project.

Memory-efficient probabilistic counter namely Morris Counter

Continue Reading…


Read More

Whats new on arXiv – Complete List

Generating random Gaussian graphical models
Learning Physics from Data: a Thermodynamic Interpretation
Local Embeddings for Relational Data Integration
Continuous optimization
Allen’s Interval Algebra Makes the Difference
Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs
Encode, Tag, Realize: High-Precision Text Editing
Incrementally Updated Spectral Embeddings
Online Analytical Processsing on Graph Data
Frameworks for Querying Databases Using Natural Language: A Literature Review
Neural Attentive Bag-of-Entities Model for Text Classification
Are Bitcoins price predictable? Evidence from machine learning techniques using technical indicators
Beyond Human-Level Accuracy: Computational Challenges in Deep Learning
Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs
Dual Student: Breaking the Limits of the Teacher in Semi-supervised Learning
High-Fidelity Extraction of Neural Network Models
Generalization in Transfer Learning
Deep Equilibrium Models
Global Optima is not Limit Computable
Mixture Probabilistic Principal GeodesicAnalysis
Mining Insights from Weakly-Structured Event Data
Adversarial Robustness of Similarity-Based Link Prediction
LCA: Loss Change Allocation for Neural Network Training
CrossWeigh: Training Named Entity Tagger from Imperfect Annotations
Interpretable Word Embeddings via Informative Priors
A Note on An Abstract Model for Branching and its Application to Mixed Integer Programming
rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch
A Smart Sliding Chinese Pinyin Input Method Editor on Touchscreen
Face-to-Parameter Translation for Game Character Auto-Creation
Modeling Named Entity Embedding Distribution into Hypersphere
Multimodal Deep Learning for Mental Disorders Prediction from Audio Speech Samples
CGC-Net: Cell Graph Convolutional Network for Grading of Colorectal Cancer Histology Images
Independence number and connectivity for fractional (a,b,k)-critical covered graphs
3DSiameseNet to Analyze Brain MRI
Statistical inference for block sparsity of complex signals
Multi-agent Learning for Neural Machine Translation
The Properties of Average Gradient in Local Region
ForkNet: Multi-branch Volumetric Semantic Completion from a Single Depth Image
Non-Parametric Class Completeness Estimators for Collaborative Knowledge Graphs — The Case of Wikidata
Many zeros of many characters of GL(n,q)
On the Notions of Equilibria for Time-Inconsistent Stopping Problems in Continuous Time
A primal dual variational formulation suitable for a large class of non-convex problems in optimization
Some lattice models with hyperbolic chaotic attractors
Fast and Efficient Model for Real-Time Tiger Detection In The Wild
Power Minimization for Wireless Backhaul Based Ultra-Dense Cache-enabled C-RAN
A generalization of rotation of binary sequences and its applications to toggle dynamical systems
Analysis of high order dimension independent RBF-FD solution of Poisson’s equation
MRI Reconstruction Using Deep Bayesian Inference
A Tool for Super-Resolving Multimodal Clinical MRI
CLT for Circular beta-Ensembles at High Temperature
Non-uniform recovery guarantees for binary measurements and infinite-dimensional compressed sensing
Bidirectional Long Short-Term Memory (BLSTM) neural networks for reconstruction of top-quark pair decay kinematics
On the inequalities in Hermite’s theorem for a real polynomial to have real zeros
A Generic Sharding Scheme for Blockchain Protocols
Numerical valuation of Bermudan basket options via partial differential equations
Finding Salient Context based on Semantic Matching for Relevance Ranking
A weak solution theory for stochastic Volterra equations of convolution type
Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed
Controlled Loosening-up (CLuP) — achieving exact MIMO ML in polynomial time
Combining Multi-Sequence and Synthetic Images for Improved Segmentation of Late Gadolinium Enhancement Cardiac MRI
The convex dimension of hypergraphs and the hypersimplicial Van Kampen-Flores Theorem
Can we trust deep learning models diagnosis? The impact of domain shift in chest radiograph classification
A Landauer Formula for Bioelectronic Applications
Into the Battlefield: Quantifying and Modeling Intra-community Conflicts in Online Discussion
Starting CLuP with polytope relaxation
Personalizing Smartwatch Based Activity Recognition Using Transfer Learning
Cross View Fusion for 3D Human Pose Estimation
Few-Shot Generalization for Single-Image 3D Reconstruction via Priors
Efficient Real-Time Camera Based Estimation of Heart Rate and Its Variability
A Low-Cost, Flexible and Portable Volumetric Capturing System
Discrete Mean Field Games: Existence of Equilibria and Convergence
Moment convergence of the generalized maximum composite likelihood estimators for determinantal point processes
Better Rewards Yield Better Summaries: Learning to Summarise Without References
Aggregating Privacy-Conscious Distributed Energy Resources for Grid Service Provision
Differentially Private Objective Perturbation: Beyond Smoothness and Convexity
Translating Visual Art into Music
A Note on the Probability of Rectangles for Correlated Binary Strings
Loop Homology of Bi-secondary Structures II
On the convergence of Krylov methods with low-rank truncations
Variations of Saffman’s robot: two mechanisms of locomotion
A CNN-based approach to classify cricket bowlers based on their bowling actions
Face posets of tropical polyhedra and monomial ideals
On the Liouville property for non-local Lévy generators
Stochastic quasi-Newton with line-search regularization
Threshold Greedy Based Task Allocation for Multiple Robot Operations
Large Scale Parallelization Using File-Based Communications
An $hp$ finite element method for a singularly perturbed reaction-convection-diffusion boundary value problem with two small parameters
Introducing RONEC — the Romanian Named Entity Corpus
Avoiding Resentment Via Monotonic Fairness
A survey of the Shi arrangement
Online Pedestrian Group Walking Event Detection Using Spectral Analysis of Motion Similarity Graph
List-Edge-Coloring Triangulations with Maximum Degree 5
On the Downstream Performance of Compressed Word Embeddings
Multiresolution analysis (discrete wavelet transform) through Daubechies family for emotion recognition in speech
On the two-step estimation of the cross–power spectrum for dynamical inverse problems
Evaluating proxy influence in assimilated paleoclimate reconstructions — Testing the exchangeability of two ensembles of spatial processes
Aspect Detection using Word and Char Embeddings with (Bi)LSTM and CRF
An Event-Driven Approach to Serverless Seismic Imaging in the Cloud
Improving Disentangled Representation Learning with the Beta Bernoulli Process
Cross-Platform Verification of Intermediate Scale Quantum Devices
Reevaluating the performance of the Double Exponential Smoothing filter and its Control Parameters
Gender-based homophily in collaborations across a heterogeneous scholarly landscape
Robust Invisible Video Watermarking with Attention
Irrelevance of linear controllability to nonlinear dynamical networks
Homogeneous Models of Nonlinear Circuits
Inverse problems for symmetric doubly stochastic matrices whose Suleĭmanova spectra are to be bounded below by 1/2
Pareto-Optima for a Generalized Ramsey Model
PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking
State Drug Policy Effectiveness: Comparative Policy Analysis of Drug Overdose Mortality
Fast finite-difference convolution for 3D problems in layered media
The Oxford Radar RobotCar Dataset: A Radar Extension to the Oxford RobotCar Dataset
3D Morphable Face Models — Past, Present and Future
Detecting Compromised Implicit Association Test Results Using Supervised Learning
Heronian friezes
Learning without feedback: Direct random target projection as a feedback-alignment algorithm with layerwise feedforward training
Optimal Causal Rate-Constrained Sampling of the Wiener Process
Universal Force Correlations in an RNA-DNA Unzipping Experiment
DeepObfusCode: Source Code Obfuscation Through Sequence-to-Sequence Networks
CMU GetGoing: An Understandable and Memorable Dialog System for Seniors
The Woman Worked as a Babysitter: On Biases in Language Generation
Bias and Consistency in Three-way Gravity Models
High order discretization methods for spatial dependent SIR models
Lower Deviations in $β$-ensembles and Law of Iterated Logarithm in Last Passage Percolation
Lund jet images from generative and cycle-consistent adversarial networks
Trouble on the Horizon: Forecasting the Derailment of Online Conversations as they Develop
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives
Context-Aware Monolingual Repair for Neural Machine Translation
Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
How to Build User Simulators to Train RL-based Dialog Systems
Towards Models for Availability and Security Evaluation of Cloud Computing with Moving Target Defense
A Novel Loss Function Incorporating Imaging Acquisition Physics for PET Attenuation Map Generation using Deep Learning
Brain2Char: A Deep Architecture for Decoding Text from Brain Recordings
Impact of Social Influence on Adoption Behavior: An Online Controlled Experimental Evaluation
Oblivious Sketching of High-Degree Polynomial Kernels
Reliable Communications Over Dependent Fading Wireless Channels
Thing/Machine-s (Thimacs) Applied to Structural Description in Software Engineering
Multi-level Attention network using text, audio and video for Depression Prediction
A categorification of the Malvenuto–Reutenauer algebra via a tower of groups
Efficient Identification of Linear Evolutions in Nonlinear Vector Fields: Koopman Invariant Subspaces
Optimization with Equality and Inequality Constraints Using Parameter Continuation
Discriminative Topic Modeling with Logistic LDA
Small-worldness favours network inference
Self-consistent approach to the description of relaxation processes in classical multiparticle systems
Zero-sum Stochastic Games with Asymmetric Information
Topologically-Guided Color Image Enhancement
A note on Pseudorandom Ramsey graphs
Predicting Specificity in Classroom Discussion
Prospect Theory Based Crowdsourcing for Classification in the Presence of Spammers
Rates of Convergence for Large-scale Nearest Neighbor Classification
L-Sweeps: A scalable, parallel preconditioner for the high-frequency Helmholtz equation
Generalized chi-squared detector for LTI systems with non-Gaussian noise
Asynchronous Time-Parallel Method based on Laplace Transform
Effective Strategies for Using Hashtags in Online Communication
How much research shared on Facebook is hidden from public view? A comparison of public and private online activity around PLOS ONE papers
Filtering Approaches for Dealing with Noise in Anomaly Detection
Fast Gradient Methods with Alignment for Symmetric Linear Systems without Using Cauchy Step
Symmetric Triangle Quadrature Rules for Arbitrary Functions
Parameter Estimation in the Hermitian and Skew-Hermitian Splitting Method Using Gradient Iterations
Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing
Minimizing the Societal Cost of Credit Card Fraud with Limited and Imbalanced Data
Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation
Cross-Cutting Political Awareness through Diverse News Recommendations
Neural Linguistic Steganography
Iterative Clustering with Game-Theoretic Matching for Robust Multi-consistency Correspondence
Demystifying Brain Tumour Segmentation Networks: Interpretability and Uncertainty Analysis
Privacy Accounting and Quality Control in the Sage Differentially Private ML Platform
Group Inference in High Dimensions with Applications to Hierarchical Testing
Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback

Continue Reading…


Read More

Whats new on arXiv

Generating random Gaussian graphical models

Structure learning methods for covariance and concentration graphs are often validated on synthetic models, usually obtained by randomly generating: (i) an undirected graph, and (ii) a compatible symmetric positive definite (SPD) matrix. In order to ensure positive definiteness in (ii), a dominant diagonal is usually imposed. In this work we investigate different methods to generate random symmetric positive definite matrices with undirected graphical constraints. We show that if the graph is chordal is possible to sample uniformly from the set of correlation matrices compatible with the graph, while for general undirected graphs we rely on a partial orthogonalization method.

Learning Physics from Data: a Thermodynamic Interpretation

Experimental data bases are typically very large and high dimensional. To learn from them requires to recognize important features (a pattern), often present at scales different to that of the recorded data. Following the experience collected in statistical mechanics and thermodynamics, the process of recognizing the pattern (the learning process) can be seen as a dissipative time evolution driven by entropy. This is the way thermodynamics enters machine learning. Learning to handle free surface liquids serves as an illustration.

Local Embeddings for Relational Data Integration

Integrating information from heterogeneous data sources is one of the fundamental problems facing any enterprise. Recently, it has been shown that deep learning based techniques such as embeddings are a promising approach for data integration problems. Prior efforts directly use pre-trained embeddings or simplistically adapt techniques from natural language processing to obtain relational embeddings. In this work, we propose algorithms for obtaining local embeddings that are effective for data integration tasks on relational data. We make three major contributions. First, we describe a compact graph-based representation that allows the specification of a rich set of relationships inherent in relational world. Second, we propose how to derive sentences from such graph that effectively describe the similarity across elements (tokens, attributes, rows) across the two datasets. The embeddings are learned based on such sentences. Finally, we propose a diverse collection of criteria to evaluate relational embeddings and perform extensive set of experiments validating them. Our experiments show that our system, EmbDI, produces meaningful results for data integration tasks and our embeddings improve the result quality for existing state of the art methods.

Continuous optimization

Sufficient conditions for the existence of efficient algorithms are established by introducing the concept of contractility for continuous optimization. Then all the possible continuous problems are divided into three categories: contractile in logarithmic time, contractile in polynomial time, or noncontractile. For the first two, we propose an efficient contracting algorithm to find the set of all global minimizers with a theoretical guarantee of linear convergence; for the last one, we discuss possible troubles caused by using the proposed algorithm.

Allen’s Interval Algebra Makes the Difference

Allen’s Interval Algebra constitutes a framework for reasoning about temporal information in a qualitative manner. In particular, it uses intervals, i.e., pairs of endpoints, on the timeline to represent entities corresponding to actions, events, or tasks, and binary relations such as precedes and overlaps to encode the possible configurations between those entities. Allen’s calculus has found its way in many academic and industrial applications that involve, most commonly, planning and scheduling, temporal databases, and healthcare. In this paper, we present a novel encoding of Interval Algebra using answer-set programming (ASP) extended by difference constraints, i.e., the fragment abbreviated as ASP(DL), and demonstrate its performance via a preliminary experimental evaluation. Although our ASP encoding is presented in the case of Allen’s calculus for the sake of clarity, we suggest that analogous encodings can be devised for other point-based calculi, too.

Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs

Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this framework, we model a sequence of credit card transactions from three different perspectives, namely (i) The sequence contains or doesn’t contain a fraud (ii) The sequence is obtained by fixing the card-holder or the payment terminal (iii) It is a sequence of spent amount or of elapsed time between the current and previous transactions. Combinations of the three binary perspectives give eight sets of sequences from the (training) set of transactions. Each one of these sequences is modelled with a Hidden Markov Model (HMM). Each HMM associates a likelihood to a transaction given its sequence of previous transactions. These likelihoods are used as additional features in a Random Forest classifier for fraud detection. Our multiple perspectives HMM-based approach offers automated feature engineering to model temporal correlations so as to improve the effectiveness of the classification task and allows for an increase in the detection of fraudulent transactions when combined with the state of the art expert based feature engineering strategy for credit card fraud detection. In extension to previous works, we show that this approach goes beyond ecommerce transactions and provides a robust feature engineering over different datasets, hyperparameters and classifiers. Moreover, we compare strategies to deal with structural missing values.

Encode, Tag, Realize: High-Precision Text Editing

We propose LaserTagger – a sequence tagging approach that casts text generation as a text editing task. Target texts are reconstructed from the inputs using three main edit operations: keeping a token, deleting it, and adding a phrase before the token. To predict the edit operations, we propose a novel model, which combines a BERT encoder with an autoregressive Transformer decoder. This approach is evaluated on English text on four tasks: sentence fusion, sentence splitting, abstractive summarization, and grammar correction. LaserTagger achieves new state-of-the-art results on three of these tasks, performs comparably to a set of strong seq2seq baselines with a large number of training examples, and outperforms them when the number of examples is limited. Furthermore, we show that at inference time tagging can be more than two orders of magnitude faster than comparable seq2seq models, making it more attractive for running in a live environment.

Incrementally Updated Spectral Embeddings

Several fundamental tasks in data science rely on computing an extremal eigenspace of size r \ll n, where n is the underlying problem dimension. For example, spectral clustering and PCA both require the computation of the leading r-dimensional subspace. Often, this process is repeated over time due to the possible temporal nature of the data; e.g., graphs representing relations in a social network may change over time, and feature vectors may be added, removed or updated in a dataset. Therefore, it is important to efficiently carry out the computations involved to keep up with frequent changes in the underlying data and also to dynamically determine a reasonable size for the subspace of interest. We present a complete computational pipeline for efficiently updating spectral embeddings in a variety of contexts. Our basic approach is to ‘seed’ iterative methods for eigenproblems with the most recent subspace estimate to significantly reduce the computations involved, in contrast with a na\’ive approach which recomputes the subspace of interest from scratch at every step. In this setting, we provide various bounds on the number of iterations common eigensolvers need to perform in order to update the extremal eigenspace to a sufficient tolerance. We also incorporate a criterion for determining the size of the subspace based on successive eigenvalue ratios. We demonstrate the merits of our approach on the tasks of spectral clustering of temporally evolving graphs and PCA of an incrementally updated data matrix.

Online Analytical Processsing on Graph Data

Online Analytical Processing (OLAP) comprises tools and algorithms that allow querying multidimensional databases. It is based on the multidimensional model, where data can be seen as a cube such that each cell contains one or more measures that can be aggregated along dimensions. In a Big Data scenario, traditional data warehousing and OLAP operations are clearly not sufficient to address current data analysis requirements, for example, social network analysis. Furthermore, OLAP operations and models can expand the possibilities of graph analysis beyond the traditional graph-based computation. Nevertheless, there is not much work on the problem of taking OLAP analysis to the graph data model. This paper proposes a formal multidimensional model for graph analysis, that considers the basic graph data, and also background information in the form of dimension hierarchies. The graphs in this model are node- and edge-labelled directed multi-hypergraphs, called graphoids, which can be defined at several different levels of granularity using the dimensions associated with them. Operations analogous to the ones used in typical OLAP over cubes are defined over graphoids. The paper presents a formal definition of the graphoid model for OLAP, proves that the typical OLAP operations on cubes can be expressed over the graphoid model, and shows that the classic data cube model is a particular case of the graphoid data model. Finally, a case study supports the claim that, for many kinds of OLAP-like analysis on graphs, the graphoid model works better than the typical relational OLAP alternative, and for the classic OLAP queries, it remains competitive.

Frameworks for Querying Databases Using Natural Language: A Literature Review

A Natural Language Interface (NLI) facilitates users to pose queries to retrieve information from a database without using any artificial language such as the Structured Query Language (SQL). Several applications in various domains including healthcare, customer support and search engines, require elaborating structured data having information on text. Moreover, many issues have been explored including configuration complexity, processing of intensive algorithms, and popularity of relational databases, due to which translating natural language to database query has become a secondary area of investigation. The emerging trend of querying systems and speech-enabled interfaces revived natural language to database queries research area., The last survey published on this topic was six years ago in 2013. To best of our knowledge, there is no recent study found which discusses the current state of the art translations frameworks for natural language for structured and non-structured query languages. In this paper, we have reviewed 47 frameworks from 2008 to 2018. Out of 47, 35 were closely relevant to our work. SQL based frameworks have been categorized as statistical, symbolic and connectionist approaches. Whereas, NoSQL based frameworks have been categorized as semantic matching and pattern matching. These frameworks are then reviewed based on their supporting language, scheme of their heuristic rule, interoperability support, dataset scope and their overall performance score. The findings stated that 70% of the work in natural language to database querying has been carried out for SQL, and NoSQL share 15%, 10% and 5% of languages like SPAROL, CYPHER and GREMLIN respectively. It has also been observed that most of the frameworks support English language only.

Neural Attentive Bag-of-Entities Model for Text Classification

This study proposes a Neural Attentive Bag-of-Entities model, which is a neural network model that performs text classification using entities in a knowledge base. Entities provide unambiguous and relevant semantic signals that are beneficial for capturing semantics in texts. We combine simple high-recall entity detection based on a dictionary, to detect entities in a document, with a novel neural attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities. We tested the effectiveness of our model using two standard text classification datasets (i.e., the 20 Newsgroups and R8 datasets) and a popular factoid question answering dataset based on a trivia quiz game. As a result, our model achieved state-of-the-art results on all datasets. The source code of the proposed model will be available online at https://…/wikipedia2vec.

Are Bitcoins price predictable? Evidence from machine learning techniques using technical indicators

The uncertainties in future Bitcoin price make it difficult to accurately predict the price of Bitcoin. Accurately predicting the price for Bitcoin is therefore important for decision-making process of investors and market players in the cryptocurrency market. Using historical data from 01/01/2012 to 16/08/2019, machine learning techniques (Generalized linear model via penalized maximum likelihood, random forest, support vector regression with linear kernel, and stacking ensemble) were used to forecast the price of Bitcoin. The prediction models employed key and high dimensional technical indicators as the predictors. The performance of these techniques were evaluated using mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R-squared). The performance metrics revealed that the stacking ensemble model with two base learner (random forest and generalized linear model via penalized maximum likelihood) and support vector regression with linear kernel as meta-learner was the optimal model for forecasting Bitcoin price. The MAPE, RMSE, MAE, and R-squared values for the stacking ensemble model were 0.0191%, 15.5331 USD, 124.5508 USD, and 0.9967 respectively. These values show a high degree of reliability in predicting the price of Bitcoin using the stacking ensemble model. Accurately predicting the future price of Bitcoin will yield significant returns for investors and market players in the cryptocurrency market.

Beyond Human-Level Accuracy: Computational Challenges in Deep Learning

Deep learning (DL) research yields accuracy and product improvements from both model architecture changes and scale: larger data sets and models, and more computation. For hardware design, it is difficult to predict DL model changes. However, recent prior work shows that as dataset sizes grow, DL model accuracy and model size grow predictably. This paper leverages the prior work to project the dataset and model size growth required to advance DL accuracy beyond human-level, to frontier targets defined by machine learning experts. Datasets will need to grow 33971 \times, while models will need to grow 6.6456\times to achieve target accuracies. We further characterize and project the computational requirements to train these applications at scale. Our characterization reveals an important segmentation of DL training challenges for recurrent neural networks (RNNs) that contrasts with prior studies of deep convolutional networks. RNNs will have comparatively moderate operational intensities and very large memory footprint requirements. In contrast to emerging accelerator designs, large-scale RNN training characteristics suggest designs with significantly larger memory capacity and on-chip caches.

Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs

Accelerating research in the emerging field of deep graph learning requires new tools. Such systems should support graph as the core abstraction and take care to maintain both forward (i.e. supporting new research ideas) and backward (i.e. integration with existing components) compatibility. In this paper, we present Deep Graph Library (DGL). DGL enables arbitrary message handling and mutation operators, flexible propagation rules, and is framework agnostic so as to leverage high-performance tensor, autograd operations, and other feature extraction modules already available in existing frameworks. DGL carefully handles the sparse and irregular graph structure, deals with graphs big and small which may change dynamically, fuses operations, and performs auto-batching, all to take advantages of modern hardware. DGL has been tested on a variety of models, including but not limited to the popular Graph Neural Networks (GNN) and its variants, with promising speed, memory footprint and scalability.

Dual Student: Breaking the Limits of the Teacher in Semi-supervised Learning

Recently, consistency-based methods have achieved state-of-the-art results in semi-supervised learning (SSL). These methods always involve two roles, an explicit or implicit teacher model and a student model, and penalize predictions under different perturbations by a consistency constraint. However, the weights of these two roles are tightly coupled since the teacher is essentially an exponential moving average (EMA) of the student. In this work, we show that the coupled EMA teacher causes a performance bottleneck. To address this problem, we introduce Dual Student, which replaces the teacher with another student. We also define a novel concept, stable sample, following which a stabilization constraint is designed for our structure to be trainable. Further, we discuss two variants of our method, which produce even higher performance. Extensive experiments show that our method improves the classification performance significantly on several main SSL benchmarks. Specifically, it reduces the error rate of the 13-layer CNN from 16.84% to 12.39% on CIFAR-10 with 1k labels and from 34.10% to 31.56% on CIFAR-100 with 10k labels. In addition, our method also achieves a clear improvement in domain adaptation.

High-Fidelity Extraction of Neural Network Models

Model extraction allows an adversary to steal a copy of a remotely deployed machine learning model given access to its predictions. Adversaries are motivated to mount such attacks for a variety of reasons, ranging from reducing their computational costs, to eliminating the need to collect expensive training data, to obtaining a copy of a model in order to find adversarial examples, perform membership inference, or model inversion attacks. In this paper, we taxonomize the space of model extraction attacks around two objectives: \emph{accuracy}, i.e., performing well on the underlying learning task, and \emph{fidelity}, i.e., matching the predictions of the remote victim classifier on any input. To extract a high-accuracy model, we develop a learning-based attack which exploits the victim to supervise the training of an extracted model. Through analytical and empirical arguments, we then explain the inherent limitations that prevent any learning-based strategy from extracting a truly high-fidelity model—i.e., extracting a functionally-equivalent model whose predictions are identical to those of the victim model on all possible inputs. Addressing these limitations, we expand on prior work to develop the first practical functionally-equivalent extraction attack for direct extraction (i.e., without training) of a model’s weights. We perform experiments both on academic datasets and a state-of-the-art image classifier trained with 1 billion proprietary images. In addition to broadening the scope of model extraction research, our work demonstrates the practicality of model extraction attacks against production-grade systems.

Generalization in Transfer Learning

Agents trained with deep reinforcement learning algorithms are capable of performing highly complex tasks including locomotion in continuous environments. In order to attain a human-level performance, the next step of research should be to investigate the ability to transfer the learning acquired in one task to a different set of tasks. Concerns on generalization and overfitting in deep reinforcement learning are not usually addressed in current transfer learning research. This issue results in underperforming benchmarks and inaccurate algorithm comparisons due to rudimentary assessments. In this study, we primarily propose regularization techniques in deep reinforcement learning for continuous control through the application of sample elimination and early stopping. First, the importance of the inclusion of training iteration to the hyperparameters in deep transfer learning problems will be emphasized. Because source task performance is not indicative of the generalization capacity of the algorithm, we start by proposing various transfer learning evaluation methods that acknowledge the training iteration as a hyperparameter. In line with this, we introduce an additional step of resorting to earlier snapshots of policy parameters depending on the target task due to overfitting to the source task. Then, in order to generate robust policies,we discard the samples that lead to overfitting via strict clipping. Furthermore, we increase the generalization capacity in widely used transfer learning benchmarks by using entropy bonus, different critic methods and curriculum learning in an adversarial setup. Finally, we evaluate the robustness of these techniques and algorithms on simulated robots in target environments where the morphology of the robot, gravity and tangential friction of the environment are altered from the source environment.

Deep Equilibrium Models

We present a new approach to modeling sequential data: the deep equilibrium model (DEQ). Motivated by an observation that the hidden layers of many existing deep sequence models converge towards some fixed point, we propose the DEQ approach that directly finds these equilibrium points via root-finding. Such a method is equivalent to running an infinite depth (weight-tied) feedforward network, but has the notable advantage that we can analytically backpropagate through the equilibrium point using implicit differentiation. Using this approach, training and prediction in these networks require only constant memory, regardless of the effective ‘depth’ of the network. We demonstrate how DEQs can be applied to two state-of-the-art deep sequence models: self-attention transformers and trellis networks. On large-scale language modeling tasks, such as the WikiText-103 benchmark, we show that DEQs 1) often improve performance over these state-of-the-art models (for similar parameter counts); 2) have similar computational requirements as existing models; and 3) vastly reduce memory consumption (often the bottleneck for training large sequence models), demonstrating an up-to 88% memory reduction in our experiments. The code is available at https://github. com/locuslab/deq .

Global Optima is not Limit Computable

We study the limit computability of finding a global optimum of a continuous function. We give a short proof to show that the problem of checking whether a point is a global minimum is not limit computable. Thereby showing the same for the problem of finding a global minimum. In the next part, we give an algorithm that converges to the global minima when a lower bound on the size of the basin of attraction of the global minima is known. We prove the convergence of this algorithm and provide some numerical experiments.

Mixture Probabilistic Principal GeodesicAnalysis

Dimensionality reduction on Riemannian manifolds is challenging due to the complex nonlinear data structures. While probabilistic principal geodesic analysis~(PPGA) has been proposed to generalize conventional principal component analysis (PCA) onto manifolds, its effectiveness is limited to data with a single modality. In this paper, we present a novel Gaussian latent variable model that provides a unique way to integrate multiple PGA models into a maximum-likelihood framework. This leads to a well-defined mixture model of probabilistic principal geodesic analysis (MPPGA) on sub-populations, where parameters of the principal subspaces are automatically estimated by employing an Expectation Maximization algorithm. We further develop a mixture Bayesian PGA (MBPGA) model that automatically reduces data dimensionality by suppressing irrelevant principal geodesics. We demonstrate the advantages of our model in the contexts of clustering and statistical shape analysis, using synthetic sphere data, real corpus callosum, and mandible data from human brain magnetic resonance~(MR) and CT images.

Mining Insights from Weakly-Structured Event Data

This thesis focuses on process mining on event data where such a normative specification is absent and, as a result, the event data is less structured. The thesis puts special emphasis on one application domain that fits this description: the analysis of smart home data where sequences of daily activities are recorded. In this thesis we propose a set of techniques to analyze such data, which can be grouped into two categories of techniques. The first category of methods focuses on preprocessing event logs in order to enable process discovery techniques to extract insights into unstructured event data. In this category we have developed the following techniques: – An unsupervised approach to refine event labels based on the time at which the event took place, allowing for example to distinguish recorded eating events into breakfast, lunch, and dinner. – An approach to detect and filter from event logs so-called chaotic activities, which are activities that cause process discovery methods to overgeneralize. – A supervised approach to abstract low-level events into more high-level events, where we show that there exist situations where process discovery approaches overgeneralize on the low-level event data but are able to find precise models on the high-level event data. The second category focuses on mining local process models, i.e., collections of process model patterns that each describe some frequent pattern, in contrast to the single global process model that is obtained with existing process discovery techniques. Several techniques are introduced in the area of local process model mining, including a basic method, fast but approximate heuristic methods, and constraint-based techniques.

Adversarial Robustness of Similarity-Based Link Prediction

Link prediction is one of the fundamental problems in social network analysis. A common set of techniques for link prediction rely on similarity metrics which use the topology of the observed subnetwork to quantify the likelihood of unobserved links. Recently, similarity metrics for link prediction have been shown to be vulnerable to attacks whereby observations about the network are adversarially modified to hide target links. We propose a novel approach for increasing robustness of similarity-based link prediction by endowing the analyst with a restricted set of reliable queries which accurately measure the existence of queried links. The analyst aims to robustly predict a collection of possible links by optimally allocating the reliable queries. We formalize the analyst problem as a Bayesian Stackelberg game in which they first choose the reliable queries, followed by an adversary who deletes a subset of links among the remaining (unreliable) queries by the analyst. The analyst in our model is uncertain about the particular target link the adversary attempts to hide, whereas the adversary has full information about the analyst and the network. Focusing on similarity metrics using only local information, we show that the problem is NP-Hard for both players, and devise two principled and efficient approaches for solving it approximately. Extensive experiments with real and synthetic networks demonstrate the effectiveness of our approach.

LCA: Loss Change Allocation for Neural Network Training

Neural networks enjoy widespread use, but many aspects of their training, representation, and operation are poorly understood. In particular, our view into the training process is limited, with a single scalar loss being the most common viewport into this high-dimensional, dynamic process. We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters. This measurement is accomplished by decomposing the components of an approximate path integral along the training trajectory using a Runge-Kutta integrator. This rich view shows which parameters are responsible for decreasing or increasing the loss during training, or which parameters ‘help’ or ‘hurt’ the network’s learning, respectively. LCA may be summed over training iterations and/or over neurons, channels, or layers for increasingly coarse views. This new measurement device produces several insights into training. (1) We find that barely over 50% of parameters help during any given iteration. (2) Some entire layers hurt overall, moving on average against the training gradient, a phenomenon we hypothesize may be due to phase lag in an oscillatory training process. (3) Finally, increments in learning proceed in a synchronized manner across layers, often peaking on identical iterations.

CrossWeigh: Training Named Entity Tagger from Imperfect Annotations

Everyone makes mistakes. So do human annotators when curating labels for named entity recognition (NER). Such label mistakes might hurt model training and interfere model comparison. In this study, we dive deep into one of the widely-adopted NER benchmark datasets, CoNLL03 NER. We are able to identify label mistakes in about 5.38% test sentences, which is a significant ratio considering that the state-of-the-art test F1 score is already around 93%. Therefore, we manually correct these label mistakes and form a cleaner test set. Our re-evaluation of popular models on this corrected test set leads to more accurate assessments, compared to those on the original test set. More importantly, we propose a simple yet effective framework, CrossWeigh, to handle label mistakes during NER model training. Specifically, it partitions the training data into several folds and train independent NER models to identify potential mistakes in each fold. Then it adjusts the weights of training data accordingly to train the final NER model. Extensive experiments demonstrate significant improvements of plugging various NER models into our proposed framework on three datasets. All implementations and corrected test set are available at our Github repo: https://…/CrossWeigh.

Interpretable Word Embeddings via Informative Priors

Word embeddings have demonstrated strong performance on NLP tasks. However, lack of interpretability and the unsupervised nature of word embeddings have limited their use within computational social science and digital humanities. We propose the use of informative priors to create interpretable and domain-informed dimensions for probabilistic word embeddings. Experimental results show that sensible priors can capture latent semantic concepts better than or on-par with the current state of the art, while retaining the simplicity and generalizability of using priors.

A Note on An Abstract Model for Branching and its Application to Mixed Integer Programming

A key ingredient in branch and bound (B&B) solvers for mixed-integer programming (MIP) is the selection of branching variables since poor or arbitrary selection can affect the size of the resulting search trees by orders of magnitude. A recent article by Le Bodic and Nemhauser [Mathematical Programming, (2017)] investigated variable selection rules by developing a theoretical model of B&B trees from which they developed some new, effective scoring functions for MIP solvers. In their work, Le Bodic and Nemhauser left several open theoretical problems, solutions to which could guide the future design of variable selection rules. In this article, we first solve many of these open theoretical problems. We then implement an improved version of the model-based branching rules in SCIP 6.0, a modern open-source MIP solver, in which we observe an 11% geometric average time and node reduction on instances of the MIPLIB 2017 Benchmark Set that require large B&B trees.

rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch

Since the recent advent of deep reinforcement learning for game play and simulated robotic control, a multitude of new algorithms have flourished. Most are model-free algorithms which can be categorized into three families: deep Q-learning, policy gradients, and Q-value policy gradients. These have developed along separate lines of research, such that few, if any, code bases incorporate all three kinds. Yet these algorithms share a great depth of common deep reinforcement learning machinery. We are pleased to share rlpyt, which implements all three algorithm families on top of a shared, optimized infrastructure, in a single repository. It contains modular implementations of many common deep RL algorithms in Python using PyTorch, a leading deep learning library. rlpyt is designed as a high-throughput code base for small- to medium-scale research in deep RL. This white paper summarizes its features, algorithms implemented, and relation to prior work, and concludes with detailed implementation and usage notes. rlpyt is available at https://…/rlpyt.

Continue Reading…


Read More

Distilled News

Combinatorics: permutations, combinations and dispositions

Combinatorics is that field of mathematics primarily concerned with counting elements from one or more sets. It can help us count the number of orders in which something can happen. In this article, I’m going to dwell on three different types of techniques:
• permutations
• dispositions
• combinations

Building simple data pipelines in Azure using Cosmos DB, Databricks and Blob Storage

Thanks to tools like Azure Databricks, we can build simple data pipelines in the cloud and use Spark to get some comprehensive insights into our data with relative ease. Combining this with the Apache Spark connector for Cosmos DB, we can leverage the power of Azure Cosmos DB to gain and store some incredible insights into our data. It’s been a while since I’ve written a post on Databricks and since I’ve been working with Cosmos DB quite a bit over the past few months, I’d thought I’d write a simple tutorial on how you can use Azure Blob Storage, Azure Databricks and Cosmos DB to build a straightforward data pipeline that does some simple transformations on our source data. I’m also going to throw a bit of Azure Key Vault into the mix to show you how simple it can be to protect vital secrets in Databricks such as Storage account keys and Cosmos DB endpoints! This blog post is mainly aimed at beginners. Ideally you would have some idea of what each component is and you’d have some understanding of Python.

Six Important Steps to Build a Machine Learning System

Creating a great machine learning system is an art. There are a lot of things to consider while building a great machine learning system. But often it happens that we as data scientists only worry about certain parts of the project. Most of the time that happens to be modeling, but in reality, the success or failure of a Machine Learning project depends on a lot of other factors.
1. Problem Definition
2. Data
2. Data
4. Features
5. Modeling
6. Experimentation

Bayesian Priors and Regularization Penalties

Bayesian Linear Models are often presented as introductory material for those seeking to learn probabilistic programming, coming in with an existing understanding in frequentist statistical learning models. I believe this is effective because it allows one to scaffold new knowledge on top of existing knowledge, and even fit something that was already understood – perhaps as just one tool among many – into a wider and more theoretically satisfying framework. The relationship between the prior distribution of parameters chosen in a Bayesian linear model and the penalty term in regularized least-squares regression is already well known. Despite this, I feel that I was able to come to a more visceral and intuitive understanding of this equivalence by empirically examining the effect of tweaking hyperparameters of each model. I hope my small experiment can do the same for you, and be supplemental to the existing proofs that are available.

30 Helpful Python Snippets That You Can Learn in 30 Seconds or Less

1. All unique
2. Anagrams
3. Memory
4. Byte size
5. Print a string N times
6. Capitalize first letters
7. Chunk
8. Compact
9. Count by
10. Chained comparison
11. Comma-separated
12. Count vowels
13. Decapitalize
14. Flatten
15. Difference
16. Difference by
17. Chained function call
18. Has duplicates
19. Merge two dictionaries
20. Convert two lists into a dictionary
21. Use enumerate
22. Time spent
22. Time spent
24. Most frequent
25. Palindrome
26. Calculator without if-else
27. Shuffle
28. Spread
29. Swap values
30. Get default value for missing keys

3 Python Tools Data Scientists Can Use for Production-Quality Code

It is an unfortunate fact that many data scientists do not know how to write production-quality code.
Production-quality code is code that is:
• Readable;
• Free from errors;
• Robust to exceptions;
• Efficient;
• Well documented; and
• Reproducible.
Producing it is not rocket science.

Practical Experiment Fundamentals All Data Scientists Should Know

A How-to for Non-Parametric Power Analyses, p-values, Confidence Intervals, Checking for Bias. This post will enable you to do a power analysis, calculate p-values, get confidence intervals, and check for bias in your design without making any assumptions (non-parametrically). This post will enable you to do a power analysis, calculate p-values, get confidence intervals, and check for bias in your design without making any assumptions (non-parametrically).

Universal Transformers

This post will discuss the Universal Transformer, which combines the original Transformer model with a technique called Adaptive Computation Time. The main innovation of Universal Transformers is to apply the Transformer components a different number of times for each symbol.

Feature Selection for Machine Learning (1/2)

Feature selection, also known as variable selection, is a powerful idea, with major implications for your machine learning workflow. Why would you ever need it? Well, how do you like to reduce your number of features 10x? Or if doing NLP, even 1000x. What about besides smaller feature space, resulting in faster training and inference, also to have an observable improvement in accuracy, or whatever metric you use for your models? If that doesn’t grab your attention, I don’t know what does. Don’t believe me? This literally happened to me a couple of days ago at work. So, this is a 2 part blog post where I’m going to explain, and show, how to do automated feature selection in Python, so that you can level up your ML game. Only filter methods will be presented because they are more generic and less compute hungry than wrapper methods, while embedded feature selection methods being, well, embedded in the model, aren’t as flexible as filter methods.

Gen Z Know Automation Will Take Their Jobs

Gen Z, the generation that comes after Millennials, are graduating college and entering the workforce. Growing up as digital natives, they know machine intelligence will scale in their generation as workers in the labor force. Those headlines about ‘robots’ coming for our jobs? Well, Gen Z have an inkling. They are actually going to live it. The eldest of Gen Z are only about 24 now, in 2019 and the majority of them are still students. Gen Z is right to have the sentiment that’s in between uncertainty and actual fear regarding their future of work.

Deep learning based web application: from data collection to deployment

Build an image classifier web application from scratch, without need for GPU or credit card. This article describes how to build a deep learning web based application for image classification, without need for GPU or credit card! Even though there are plenty of articles describing this stuff, I couldn’t find a complete guide describing all the steps from data collection to deployment and some details were hard to find (e.g. how to clone a Github repo on Google Drive).

PEARL: Probabilistic Embeddings for Actor-critic RL

A sample-efficient meta reinforcement learning method. Meta reinforcement learning could be particularly challenging because the agent has to not only adapt to the new incoming data but also find an efficient way to explore the new environment. Current meta-RL algorithms rely heavily on on-policy experience, which limits their sample efficiency. Worse still, most of them lack mechanisms to reason about task uncertainty when adapting to a new task, limiting their effectiveness in sparse reward problems. We discuss a meta-RL algorithm that attempts to address these challenges. In a nutshell, the algorithm, namely Probabilistic Embeddings for Actor-critic RL(PEARL) proposed by Rakelly & Zhou et al. in ICLR 2019, is comprised of two parts: It learns a probabilistic latent context that sufficiently describes a task; conditioned on that latent context, an off-policy RL algorithm learns to take actions. In this framework, the probabilistic latent context serves as the belief state of the current task. By conditioning the RL algorithm on the latent context, we expect the RL algorithm to learn to distinguish different tasks. Moreover, this disentangles task inference from action making, which, as we will see later, makes an off-policy algorithm applicable to meta-learning.

Practical guide to Attention mechanism for NLU tasks

Chatbots, virtual assistants, augmented analytic systems typically receive user queries such as ‘Find me an action movie by Steven Spielberg’. The system should correctly detect the intent ‘find_movie’ while filling the slots ‘genre’ with value ‘action’ and ‘directed_by’ with value ‘Steven Spielberg’. This is a Natural Language Understanding (NLU) task kown as Intent Classification & Slot Filling. State-of-the-art performance is typically obtained using recurrent neural network (RNN) based approaches, as well as by leveraging an encoder-decoder architecture with sequence-to-sequence models. In this article we demonstrate hands-on strategies for improving the performance even further by adding Attention mechanism.

From Econometrics to Machine Learning

Why econometrics should be part of your skills. As a Data scientist with a master’s degree in econometrics, I took some time to understand the subtleties that make machine learning a different discipline from econometrics. I would like to talk to you about these subtleties that are not obvious at first sight and that made me wonder all along my journey.

Conditional Love: The Rise of Renormalization Techniques for Conditioning Neural Networks

Batch Normalization, and the zoo of related normalization strategies that have grown up around it, have played an interesting array of roles in recent deep learning research: as a wunderkind optimization trick, a focal point for discussions about theoretical rigor and, importantly, but somewhat more in the sidelines, as a flexible and broadly successful avenue for injecting conditioning information into models. Conditional renormalization started humbly enough, as a clever trick for training more flexible style transfer models, but over the years this originally-simple trick has grown in complexity and conceptual scope. I kept seeing new variants of this strategy pop up, not just on the edges of the literature, but in its most central and novel advances: from the winner of 2017’s ImageNet competition to 2018’s most impressive generative image model. The more I saw it, the more I wanted to tell the story of this simple idea I’d watched grow and evolve from a one-off trick to a broadly applicable way of integrating new information in a low-complexity way.

Continue Reading…


Read More

R Packages worth a look

Memory-Efficient, Visualize-Enhanced, Parallel-Accelerated GWAS Tool (rMVP)
A memory-efficient, visualize-enhanced, parallel-accelerated Genome-Wide Association Study (GWAS) tool. It can
(1) effectively process large data,
(2) rapidly evaluate population structure,
(3) efficiently estimate variance components several algorithms,
(4) implement parallel-accelerated association tests of markers three methods,
(5) globally efficient design on GWAS process computing,
(6) enhance visualization of related information.
‘rMVP’ contains three models GLM (Alkes Price (2006) <DOI:10.1038/ng1847>), MLM (Jianming Yu (2006) <DOI:10.1038/ng1702>)
and FarmCPU (Xiaolei Liu (2016) <doi:10.1371/journal.pgen.1005767>); variance components estimation methods EMMAX
(Hyunmin Kang (2008) <DOI:10.1534/genetics.107.080101>;), FaSTLMM (method: Christoph Lippert (2011) <DOI:10.1038/nmeth.1681>,
R implementation from ‘GAPIT2’: You Tang and Xiaolei Liu (2016) <DOI:10.1371/journal.pone.0107684> and
‘SUPER’: Qishan Wang and Feng Tian (2014) <DOI:10.1371/journal.pone.0107684>), and HE regression
(Xiang Zhou (2017) <DOI:10.1214/17-AOAS1052>).

Read ‘Excel’ Binary (.xlsb) Workbooks (readxlsb)
Import data from ‘Excel’ binary (.xlsb) workbooks into R.

Algebraic and Statistical Functions for Genetics (miraculix)
This is a collection of fast tools for application in quantitative genetics. For instance, the SNP matrix can be stored in a minimum of memory and the calculation of the genomic relationship matrix is based on a rapid algorithm. It also contains the window scanning approach by Kabluchko and Spodarev (2009), <doi:10.1239/aap/1240319575> to detect anomalous genomic areas <doi:10.1186/s12864-018-5009-y>. Furthermore, the package is used in the Modular Breeding Program Simulator (MoBPS, <https://…/MoBPS>, <http://…/> ). The tools are based on SIMD (Single Instruction Multiple Data, <https://…/SIMD> ) and OMP (Open Multi-Processing, <https://…/OpenMP> ).

Access Landscape Evaporative Response Index Raster Data (leri)
Finds and downloads Landscape Evaporative Response Index (LERI) data, then reads the data into ‘R’ using the ‘raster’ package. The LERI product measures anomalies in actual evapotranspiration, to support drought monitoring and early warning systems. More info on LERI is available at <https://…/>.

Boltzmann Bayes Learner (bbl)
Supervised learning using Boltzmann Bayes model inference, which extends naive Bayes model to include interactions. Enables classification of data into multiple response groups based on a large number of discrete predictors that can take factor values of heterogeneous levels. Either pseudo-likelihood and mean field inference can be used with L2 regularization, cross-validation, and prediction on new data. Woo et al. (2016) <doi:10.1186/s12864-016-2871-3>.

Continue Reading…


Read More

Document worth reading: “A survey on Adversarial Attacks and Defenses in Text”

Deep neural networks (DNNs) have shown an inherent vulnerability to adversarial examples which are maliciously crafted on real examples by attackers, aiming at making target DNNs misbehave. The threats of adversarial examples are widely existed in image, voice, speech, and text recognition and classification. Inspired by the previous work, researches on adversarial attacks and defenses in text domain develop rapidly. To the best of our knowledge, this article presents a comprehensive review on adversarial examples in text. We analyze the advantages and shortcomings of recent adversarial examples generation methods and elaborate the efficiency and limitations on countermeasures. Finally, we discuss the challenges in adversarial texts and provide a research direction of this aspect. A survey on Adversarial Attacks and Defenses in Text

Continue Reading…


Read More

Magister Dixit

“It is not an experiment if you know it is going to work.” Jeff Bezos

Continue Reading…


Read More

Thanks for reading!