My Data Science Blogs

June 27, 2019

An Overview of Outlier Detection Methods from PyOD – Part 1

PyOD is an outlier detection package developed with a comprehensive API to support multiple techniques. This post will showcase Part 1 of an overview of techniques that can be used to analyze anomalies in data.

Continue Reading…

Collapse

Read More

One simple chart: Who is interested in Spark NLP?

As we close in on its two-year anniversary, Spark NLP is proving itself a viable option for enterprise use.

In July 2016, I broached the idea for an NLP library aimed at Apache Spark users to my friend David Talby. A little over a year later, Talby and his collaborators announced the release of Spark NLP. They described the motivation behind the project in their announcement post and in this accompanying podcast that Talby and I wrote, as well as in this recent post comparing popular open source NLP libraries. [Full disclosure: I’m an advisor to Databricks, the startup founded by the team that originated Apache Spark.]

As we close in on the two-year anniversary of the project, I asked Talby where interest in the project has come from, and he graciously shared geo-demographic data of visitors to the project’s homepage:

spark nlp geo-demographic dataFigure 1. Spark NLP geo-demographic data of visitors. Slide by Ben Lorica, data courtesy of David Talby.

Of the thousands of visitors to the site: 44% are from the Americas, 24% from Asia-Pacific, and the remaining 22% are based in the EMEA region.

Many of these site visitors are turning into users of the project. In our recent survey AI Adoption in the Enterprise, quite a few respondents signalled that they were giving Spark NLP a try. The project also garnered top prize—based on a tally of votes cast by Strata Data Conference attendees—in the open source category at the Strata Data awards in March.

There are many other excellent open source NLP libraries with significant numbers of users—spaCy, OpenNLP, Stanford CoreNLP, NLTK—but at the time when the project started, there seemed to be an opportunity for a library that appealed to users who already had Spark clusters (and needed a scalable solution). While the project started out targeting Apache Spark users, it has evolved to provide simple API’s that get things done in a few lines of code and fully hide Spark under the hood. The library’s Python API now has the most users. Installing Spark NLP is a one-liner operation using pip or conda for Python, or a single package pull on Java or Scala using maven, sbt, or spark-packages. The library’s documentation has also grown, and there are public online examples for common tasks like sentiment analysis, named entity recognition, and spell checking. Improvements in documentation, ease-of-use, and its production-ready implementation of key deep learning models, combined with speed, scalability, and accuracy has made Spark NLP a viable option for enterprises needing an NLP library.

For more on Spark NLP, join Talby and his fellow instructors for a three-hour tutorial, Natural language understanding at scale with Spark NLP, at the Strata Data Conference in New York City, September 23-26, 2019.

Related content:

Continue reading One simple chart: Who is interested in Spark NLP?.

Continue Reading…

Collapse

Read More

Four short links: 27 June 2019

Security Mnemonics, Evidence Might Work, Misinformation Inoculation, and Spoofing Presidential Alerts

  1. STRIDE -- mnemonic for remembering the different types of threads: Spoofing of user identity; Tampering; Repudiation; Information disclosure (privacy breach or data leak); Denial of service (D.o.S); Elevation of privilege. Use when you're asking yourself, "what could possibly go wrong?" There's probably a parallel "how things can be misused" mnemonic like Nazis, Anti-Vaxx, Spam, Threats, and Your Ex- Follows You.
  2. Backfire Effect is Mostly a Myth (Nieman Lab) -- some evidence that giving people evidence that shows they're wrong can change their mind. Perhaps you no longer have to be careful to whom you show this story. Full Fact research manager Amy Sippett reviewed seven studies that have explored the backfire effect and found that “cases where backfire effects were found tended to be particularly contentious topics, or where the factual claim being asked about was ambiguous.” The studies where a backfire effect was not found also tended to be larger than the studies where it was found. Full Fact cautions that most of the research on the backfire effect has been done in the U.S., and “we still need more evidence to understand how fact-checking content can be most effective.”
  3. Bad News -- a browser game by Cambridge University researchers that seems to inoculate users against misinformation. We conducted a large-scale evaluation of the game with N = 15,000 participants in a pre-post gameplay design. We provide initial evidence that people’s ability to spot and resist misinformation improves after gameplay, irrespective of education, age, political ideology, and cognitive style. (via Cambridge University)
  4. Spoofing Presidential Alerts -- Their research showed that four low cost USRP or bladeRF TX capable software defined radios (SDR) with 1 watt output power each, combined with open source LTE base station software could be used to send a fake Presidential Alert to a stadium of 50,000 people (note that this was only simulated—real-world tests were performed responsibly in a controlled environment). The attack works by creating a fake and malicious LTE cell tower on the SDR that nearby cell phones connect to. Once connected an alert can easily be crafted and sent to all connected phones. There is no way to verify that an alert is legitimate. The article itself is paywalled, though Sci-Hub knows how to reach it.

Continue reading Four short links: 27 June 2019.

Continue Reading…

Collapse

Read More

Whats new on arXiv – Complete List

Studies on the Software Testing Profession
Astra Version 1.0: Evaluating Translations from Alloy to SMT-LIB
Generation of Pseudo Code from the Python Source Code using Rule-Based Machine Translation
Cognitive Knowledge Graph Reasoning for One-shot Relational Learning
A Computational Analysis of Natural Languages to Build a Sentence Structure Aware Artificial Neural Network
Individualized Group Learning
Blockchain Games: A Survey
Topic Modeling via Full Dependence Mixtures
A Computationally Efficient Method for Defending Adversarial Deep Learning Attacks
Training Neural Networks for and by Interpolation
Post-Processing of High-Dimensional Data
KCAT: A Knowledge-Constraint Typing Annotation Tool
Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking
Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text
A JIT Compiler for Neural Network Inference
Unsupervised Image Noise Modeling with Self-Consistent GAN
Improving Prediction Accuracy in Building Performance Models Using Generative Adversarial Networks (GANs)
Deep Reinforcement Learning for Cyber Security
Sub-policy Adaptation for Hierarchical Reinforcement Learning
Contrastive Multiview Coding
Reweighted Expectation Maximization
Semantics to Space(S2S): Embedding semantics into spatial space for zero-shot verb-object query inferencing
Learning to Forget for Meta-Learning
Landslide Geohazard Assessment With Convolutional Neural Networks Using Sentinel-2 Imagery Data
Solving Large-Scale 0-1 Knapsack Problems and its Application to Point Cloud Resampling
On Horadam-Lucas sequence
Inverse Problems, Regularization and Applications
Deep Learning-Based Decoding of Constrained Sequence Codes
Robust and interpretable blind image denoising via bias-free convolutional neural networks
S3: A Spectral-Spatial Structure Loss for Pan-Sharpening Networks
Measuring the Gain of a Micro-Channel Plate/Phosphor Assembly Using a Convolutional Neural Network
Enriching Neural Models with Targeted Features for Dementia Detection
On Longest Common Property Preserved Substring Queries
Utilizing Edge Features in Graph Neural Networks via Variational Information Maximization
Interpretable ICD Code Embeddings with Self- and Mutual-Attention Mechanisms
Time scales in stock markets
An image-driven machine learning approach to kinetic modeling of a discontinuous precipitation reaction
Deep Network Approximation Characterized by Number of Neurons
Understanding Human Context in 3D Scenes by Learning Spatial Affordances with Virtual Skeleton Models
Localization in Gaussian disordered systems at low temperature
Fractional cocoloring of graphs
Scalable Community Detection over Geo-Social Network
Character n-gram Embeddings to Improve RNN Language Models
A Meta Approach to Defend Noisy Labels by the Manifold Regularizer PSDR
Binomial edge ideals of cographs
MIMA: MAPPER-Induced Manifold Alignment for Semi-Supervised Fusion of Optical Image and Polarimetric SAR Data
Self-organized avalanches in globally-coupled phase oscillators
Meta-heuristic for non-homogeneous peak density spaces and implementation on 2 real-world parameter learning/tuning applications
Know What You Don’t Know: Modeling a Pragmatic Speaker that Refers to Objects of Unknown Categories
Signed Hultman Numbers and Signed Generalized Commuting Probability in Finite Groups
New constructions of asymptotically optimal codebooks via character sums over a local ring
Illuminant Chromaticity Estimation from Interreflections
Zeroth-Order Stochastic Block Coordinate Type Methods for Nonconvex Optimization
Deep Variational Networks with Exponential Weighting for Learning Computed Tomography
Sparse Approximate Factor Estimation for High-Dimensional Covariance Matrices
Identifying Illicit Accounts in Large Scale E-payment Networks — A Graph Representation Learning Approach
Lattice Transformer for Speech Translation
Mir-BFT: High-Throughput BFT for Blockchains
Associated Learning: Decomposing End-to-end Backpropagation based on Auto-encoders and Target Propagation
Game-Theoretic Mixed $H_2/H_{\infty}$ Control with Sparsity Constraint for Multi-agent Networked Control Systems
A Turing Kernelization Dichotomy for Structural Parameterizations of $\mathcal{F}$-Minor-Free Deletion
Hypotheses testing and posterior concentration rates for semi-Markov processes
Rate Balancing for Multiuser MIMO Systems
Hypercontractivity for global functions and sharp thresholds
Learning Spatio-Temporal Representation with Local and Global Diffusion
Proactive Human-Machine Conversation with Explicit Conversation Goals
The Consensus Number of a Cryptocurrency (Extended Version)
Direct Sampling of Bayesian Thin-Plate Splines for Spatial Smoothing
Spaceland Embedding of Sparse Stochastic Graphs
Cut Selection For Benders Decomposition
Information capacity of a network of spiking neurons
Amur Tiger Re-identification in the Wild
On discrete idempotent paths
Variance Estimation For Online Regression via Spectrum Thresholding
Counting integer points of flow polytopes
On the first fall degree of summation polynomials
$c^+$GAN: Complementary Fashion Item Recommendation
On Edge-Partitioning of Complete Geometric Graphs into Plane Trees
Noether theorem for action-dependent Lagrangian functions: conservation laws for non-conservative systems
A review of available software for adaptive clinical trial design
On Convex Graphs Having Plane Spanning Subgraph of Certain Type
Non-convex optimization via strongly convex majoirziation-minimization
Densities for piecewise deterministic Markov processes with boundary
Vertex properties of maximum scattered linear sets of $\mathrm{PG}(1,q^n)$
Antonym-Synonym Classification Based on New Sub-space Embeddings
An Asymmetric Random Rado Theorem: 1-statement
Decentralised Multi-Demic Evolutionary Approach to the Dynamic Multi-Agent Travelling Salesman Problem
Dense Deformation Network for High Resolution Tissue Cleared Image Registration
On the Complexity of an Augmented Lagrangian Method for Nonconvex Optimization
Self-organized critical balanced networks: a unified framework
Strategic customer behavior in a queueing system with alternating information structure
Quasi-Stationary Distributions and Resilience: What to get from a sample?
On the 4-color theorem for signed graphs
Nearly all cacti are edge intersection hypergraphs of 3-uniform hypergraphs
Use of Emergency Departments by Frail Elderly Patients: Temporal Patterns and Case Complexity
A stabilized DG cut cell method for discretizing the linear transport equation
Comparative Analysis of Switching Dynamics in Different Memristor Models
A Semi-strong Perfect Digraph Theorem
Grid R-CNN Plus: Faster and Better
Rate of change of frequency under line contingencies in high voltage electric power networks with uncertainties
Smooth digraphs modulo primitive positive constructability
Lower a posteriori error estimates on anisotropic meshes
Modeling and Verifying Cyber-Physical Systems with Hybrid Active Objects
Slim DensePose: Thrifty Learning from Sparse Annotations and Motion Cues
2D Attentional Irregular Scene Text Recognizer
Hilbert Space Fragmentation and Many-Body Localization
Iterative subtraction method for Feature Ranking
Curriculum Learning for Cumulative Return Maximization
Dynamic Control of Functional Splits for Energy Harvesting Virtual Small Cells: a Distributed Reinforcement Learning Approach
Querying a Matrix through Matrix-Vector Products
Generating and Exploiting Probabilistic Monocular Depth Estimates
Information-theoretic measures for non-linear causality detection: application to social media sentiment and cryptocurrency prices
Distributed High-dimensional Regression Under a Quantile Loss Function
Contrastive Bidirectional Transformer for Temporal Representation Learning
Memory-Efficient Group-by Aggregates over Multi-Way Joins
Nonlinear System Identification via Tensor Completion
Modeling the Dynamics of PDE Systems with Physics-Constrained Deep Auto-Regressive Networks
The iMaterialist Fashion Attribute Dataset
Anderson localisation in stationary ensembles of quasiperiodic operators
Graphs of bounded depth-$2$ rank-brittleness
The rank of sparse random matrices
Efficient calibration for high-dimensional computer model output using basis methods
Semantic Change and Semantic Stability: Variation is Key
Microscopic and macroscopic perspectives on stationary nonequilibrium states
Hypersimplicial subdivisions
Anti dependency distance minimization in short sequences. A graph theoretic approach
Advance gender prediction tool of first names and its use in analysing gender disparity in Computer Science in the UK, Malaysia and China
Knock Intensity Distribution and a Stochastic Control Framework for Knock Control
Deep Unfolding for Communications Systems: A Survey and Some New Directions
Training Image Estimators without Image Ground-Truth
Machine Learning Based Analysis and Quantification of Potential Power Gain from Passive Device Installation
Characteristic Power Series of Graph Limits
A Low-Power Domino Logic Architecture for Memristor-Based Neuromorphic Computing
UCAM Biomedical translation at WMT19: Transfer learning multi-domain ensembles
On the Walks and Bipartite Double Coverings of Graphs with the same Main Eigenspace
Modeling and Control of Combustion Phasing in Dual-Fuel Compression Ignition Engines
Extending Eigentrust with the Max-Plus Algebra
Egocentric affordance detection with the one-shot geometry-driven Interaction Tensor
Topological Data Analysis for Arrhythmia Detection through Modular Neural Networks
The Replica Dataset: A Digital Replica of Indoor Spaces
Modeling and Interpreting Real-world Human Risk Decision Making with Inverse Reinforcement Learning
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
On bulk deviations for the local behavior of random interlacements
Lower Bounds for Adversarially Robust PAC Learning
On mean-field theories of dynamics in supercooled liquids
Robust Regression for Safe Exploration in Control
Telephonetic: Making Neural Language Models Robust to ASR and Semantic Noise
Solution of the Unconditional Extremal Problem for a Linear-Fractional Integral Functional Depending on the Parameter
Kernel and Deep Regimes in Overparametrized Models
Overcoming Mean-Field Approximations in Recurrent Gaussian Process Models
Robust linear domain decomposition schemes for reduced non-linear fracture flow models
The Communication Complexity of Optimization
On co-minimal pairs in abelian groups
Goal-conditioned Imitation Learning
Fractional Local Dimension
Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards
Concentration estimates for algebraic intersections
Mask2Lesion: Mask-Constrained Adversarial Skin Lesion Image Synthesis
Turing complete mechanical processor via automated nonlinear system design
Multivariate polynomials for generalized permutohedra
Spectra and eigenspaces from regular partitions of Cayley (di)graphs of permutation groups
Detecting Photoshopped Faces by Scripting Photoshop
Show, Match and Segment: Joint Learning of Semantic Matching and Object Co-segmentation
Joint Concept Matching-Space Projection Learning for Zero-Shot Recognition
Report-Sensitive Spot-Checking in Peer-Grading Systems
Can generalised relative pose estimation solve sparse 3D registration?
IntrinSeqNet: Learning to Estimate the Reflectance from Varying Illumination
Learning Instance Occlusion for Panoptic Segmentation
Dynamic PET cardiac and parametric image reconstruction: a fixed-point proximity gradient approach using patch-based DCT and tensor SVD regularization
Hallucinating Bag-of-Words and Fisher Vector IDT terms for CNN-based Action Recognition
Identify treatment effect patterns for personalised decisions
Distributionally Robust Counterfactual Risk Minimization

Continue Reading…

Collapse

Read More

Whats new on arXiv

Cognitive Knowledge Graph Reasoning for One-shot Relational Learning

Inferring new facts from existing knowledge graphs (KG) with explainable reasoning processes is a significant problem and has received much attention recently. However, few studies have focused on relation types unseen in the original KG, given only one or a few instances for training. To bridge this gap, we propose CogKR for one-shot KG reasoning. The one-shot relational learning problem is tackled through two modules: the summary module summarizes the underlying relationship of the given instances, based on which the reasoning module infers the correct answers. Motivated by the dual process theory in cognitive science, in the reasoning module, a cognitive graph is built by iteratively coordinating retrieval (System 1, collecting relevant evidence intuitively) and reasoning (System 2, conducting relational reasoning over collected information). The structural information offered by the cognitive graph enables our model to aggregate pieces of evidence from multiple reasoning paths and explain the reasoning process graphically. Experiments show that CogKR substantially outperforms previous state-of-the-art models on one-shot KG reasoning benchmarks, with relative improvements of 24.3%-29.7% on MRR. The source code is available at https://…/CogKR.


A Computational Analysis of Natural Languages to Build a Sentence Structure Aware Artificial Neural Network

Natural languages are complexly structured entities. They exhibit characterising regularities that can be exploited to link them one another. In this work, I compare two morphological aspects of languages: Written Patterns and Sentence Structure. I show how languages spontaneously group by similarity in both analyses and derive an average language distance. Finally, exploiting Sentence Structure I developed an Artificial Neural Network capable of distinguishing languages suggesting that not only word roots but also grammatical sentence structure is a characterising trait which alone suffice to identify them.


Individualized Group Learning

Many massive data are assembled through collections of information of a large number of individuals in a population. The analysis of such data, especially in the aspect of individualized inferences and solutions, has the potential to create significant value for practical applications. Traditionally, inference for an individual in the data set is either solely relying on the information of the individual or from summarizing the information about the whole population. However, with the availability of big data, we have the opportunity, as well as a unique challenge, to make a more effective individualized inference that takes into consideration of both the population information and the individual discrepancy. To deal with the possible heterogeneity within the population while providing effective and credible inferences for individuals in a data set, this article develops a new approach called the individualized group learning (iGroup). The iGroup approach uses local nonparametric techniques to generate an individualized group by pooling other entities in the population which share similar characteristics with the target individual. Three general cases of iGroup are discussed, and their asymptotic performances are investigated. Both theoretical results and empirical simulations reveal that, by applying iGroup, the performance of statistical inference on the individual level are ensured and can be substantially improved from inference based on either solely individual information or entire population information. The method has a broad range of applications. Two examples in financial statistics and maritime anomaly detection are presented.


Blockchain Games: A Survey

With the support of the blockchain systems, the cryptocurrency has changed the world of virtual assets. Digital games, especially those with massive multi-player scenarios, will be significantly impacted by this novel technology. However, there are insufficient academic studies on this topic. In this work, we filled the blank by surveying the state-of-the-art blockchain games. We discuss the blockchain integration for games and then categorize existing blockchain games from the aspects of their genres and technical platforms. Moreover, by analyzing the industrial trend with a statistical approach, we envision the future of blockchain games from technological and commercial perspectives.


Topic Modeling via Full Dependence Mixtures

We consider the topic modeling problem for large datasets. For this problem, Latent Dirichlet Allocation (LDA) with a collapsed Gibbs sampler optimization is the state-of-the-art approach in terms of topic quality. However, LDA is a slow approach, and running it on large datasets is impractical even with modern hardware. In this paper we propose to fit topics directly to the co-occurances data of the corpus. In particular, we introduce an extension of a mixture model, the Full Dependence Mixture (FDM), which arises naturally as a model of a second moment under general generative assumptions on the data. While there is some previous work on topic modeling using second moments, we develop a direct stochastic optimization procedure for fitting an FDM with a single Kullback Leibler objective. While moment methods in general have the benefit that an iteration no longer needs to scale with the size of the corpus, our approach also allows us to leverage standard optimizers and GPUs for the problem of topic modeling. We evaluate the approach on synthetic and semi-synthetic data, as well as on the SOTU and Neurips Papers corpora, and show that the approach outperforms LDA, where LDA is run on both full and sub-sampled data.


A Computationally Efficient Method for Defending Adversarial Deep Learning Attacks

The reliance on deep learning algorithms has grown significantly in recent years. Yet, these models are highly vulnerable to adversarial attacks, which introduce visually imperceptible perturbations into testing data to induce misclassifications. The literature has proposed several methods to combat such adversarial attacks, but each method either fails at high perturbation values, requires excessive computing power, or both. This letter proposes a computationally efficient method for defending the Fast Gradient Sign (FGS) adversarial attack by simultaneously denoising and compressing data. Specifically, our proposed defense relies on training a fully connected multi-layer Denoising Autoencoder (DAE) and using its encoder as a defense against the adversarial attack. Our results show that using this dimensionality reduction scheme is not only highly effective in mitigating the effect of the FGS attack in multiple threat models, but it also provides a 2.43x speedup in comparison to defense strategies providing similar robustness against the same attack.


Training Neural Networks for and by Interpolation

The majority of modern deep learning models are able to interpolate the data: the empirical loss can be driven near zero on all samples simultaneously. In this work, we explicitly exploit this interpolation property for the design of a new optimization algorithm for deep learning. Specifically, we use it to compute an adaptive learning-rate given a stochastic gradient direction. This results in the Adaptive Learning-rates for Interpolation with Gradients (ALI-G) algorithm. ALI-G retains the advantages of SGD, which are low computational cost and provable convergence in the convex setting. But unlike SGD, the learning-rate of ALI-G can be computed inexpensively in closed-form and does not require a manual schedule. We provide a detailed analysis of ALI-G in the stochastic convex setting with explicit convergence rates. In order to obtain good empirical performance in deep learning, we extend the algorithm to use a maximal learning-rate, which gives a single hyper-parameter to tune. We show that employing such a maximal learning-rate has an intuitive proximal interpretation and preserves all convergence guarantees. We provide experiments on a variety of architectures and tasks: (i) learning a differentiable neural computer; (ii) training a wide residual network on the SVHN data set; (iii) training a Bi-LSTM on the SNLI data set; and (iv) training wide residual networks and densely connected networks on the CIFAR data sets. We empirically show that ALI-G outperforms adaptive gradient methods such as Adam, and provides comparable performance with SGD, although SGD benefits from manual learning rate schedules. We release PyTorch and Tensorflow implementations of ALI-G as standalone optimizers that can be used as a drop-in replacement in existing code (code available at https://…/ali-g ).


Post-Processing of High-Dimensional Data

Scientific computations or measurements may result in huge volumes of data. Often these can be thought of representing a real-valued function on a high-dimensional domain, and can be conceptually arranged in the format of a tensor of high degree in some truncated or lossy compressed format. We look at some common post-processing tasks which are not obvious in the compressed format, as such huge data sets can not be stored in their entirety, and the value of an element is not readily accessible through simple look-up. The tasks we consider are finding the location of maximum or minimum, or minimum and maximum of a function of the data, or finding the indices of all elements in some interval — i.e. level sets, the number of elements with a value in such a level set, the probability of an element being in a particular level set, and the mean and variance of the total collection. The algorithms to be described are fixed point iterations of particular functions of the tensor, which will then exhibit the desired result. For this, the data is considered as an element of a high degree tensor space, although in an abstract sense, the algorithms are independent of the representation of the data as a tensor. All that we require is that the data can be considered as an element of an associative, commutative algebra with an inner product. Such an algebra is isomorphic to a commutative sub-algebra of the usual matrix algebra, allowing the use of matrix algorithms to accomplish the mentioned tasks. We allow the actual computational representation to be a lossy compression, and we allow the algebra operations to be performed in an approximate fashion, so as to maintain a high compression level. One such example which we address explicitly is the representation of data as a tensor with compression in the form of a low-rank representation.


KCAT: A Knowledge-Constraint Typing Annotation Tool

Fine-grained Entity Typing is a tough task which suffers from noise samples extracted from distant supervision. Thousands of manually annotated samples can achieve greater performance than millions of samples generated by the previous distant supervision method. Whereas, it’s hard for human beings to differentiate and memorize thousands of types, thus making large-scale human labeling hardly possible. In this paper, we introduce a Knowledge-Constraint Typing Annotation Tool (KCAT), which is efficient for fine-grained entity typing annotation. KCAT reduces the size of candidate types to an acceptable range for human beings through entity linking and provides a Multi-step Typing scheme to revise the entity linking result. Moreover, KCAT provides an efficient Annotator Client to accelerate the annotation process and a comprehensive Manager Module to analyse crowdsourcing annotations. Experiment shows that KCAT can significantly improve annotation efficiency, the time consumption increases slowly as the size of type set expands.


Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking

This paper focuses on the end-to-end abstractive summarization of a single product review without supervision. We assume that a review can be described as a discourse tree, in which the summary is the root, and the child sentences explain their parent in detail. By recursively estimating a parent from its children, our model learns the latent discourse tree without an external parser and generates a concise summary. We also introduce an architecture that ranks the importance of each sentence on the tree to support summary generation focusing on the main review point. The experimental results demonstrate that our model is competitive with or outperforms other unsupervised approaches. In particular, for relatively long reviews, it achieves a competitive or better performance than supervised models. The induced tree shows that the child sentences provide additional information about their parent, and the generated summary abstracts the entire review.


Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text

Multilingual writers and speakers often alternate between two languages in a single discourse, a practice called ‘code-switching’. Existing sentiment detection methods are usually trained on sentiment-labeled monolingual text. Manually labeled code-switched text, especially involving minority languages, is extremely rare. Consequently, the best monolingual methods perform relatively poorly on code-switched text. We present an effective technique for synthesizing labeled code-switched text from labeled monolingual text, which is more readily available. The idea is to replace carefully selected subtrees of constituency parses of sentences in the resource-rich language with suitable token spans selected from automatic translations to the resource-poor language. By augmenting scarce human-labeled code-switched text with plentiful synthetic code-switched text, we achieve significant improvements in sentiment labeling accuracy (1.5%, 5.11%, 7.20%) for three different language pairs (English-Hindi, English-Spanish and English-Bengali). We also get significant gains for hate speech detection: 4% improvement using only synthetic text and 6% if augmented with real text.


A JIT Compiler for Neural Network Inference

This paper describes a C++ library that compiles neural network models at runtime into machine code that performs inference. This approach in general promises to achieve the best performance possible since it is able to integrate statically known properties of the network directly into the code. In our experiments on the NAO V6 platform, it outperforms existing implementations significantly on small networks, while being inferior on large networks. The library was already part of the B-Human code release 2018, but has been extended since and is now available as a standalone version that can be integrated into any C++14 code base.


Unsupervised Image Noise Modeling with Self-Consistent GAN

Noise modeling lies in the heart of many image processing tasks. However, existing deep learning methods for noise modeling generally require clean and noisy image pairs for model training; these image pairs are difficult to obtain in many realistic scenarios. To ameliorate this problem, we propose a self-consistent GAN (SCGAN), that can directly extract noise maps from noisy images, thus enabling unsupervised noise modeling. In particular, the SCGAN introduces three novel self-consistent constraints that are complementary to one another, viz.: the noise model should produce a zero response over a clean input; the noise model should return the same output when fed with a specific pure noise input; and the noise model also should re-extract a pure noise map if the map is added to a clean image. These three constraints are simple yet effective. They jointly facilitate unsupervised learning of a noise model for various noise types. To demonstrate its wide applicability, we deploy the SCGAN on three image processing tasks including blind image denoising, rain streak removal, and noisy image super-resolution. The results demonstrate the effectiveness and superiority of our method over the state-of-the-art methods on a variety of benchmark datasets, even though the noise types vary significantly and paired clean images are not available.


Improving Prediction Accuracy in Building Performance Models Using Generative Adversarial Networks (GANs)

Building performance discrepancies between building design and operation are one of the causes that lead many new designs fail to achieve their goals and objectives. One of main factors contributing to the discrepancy is occupant behaviors. Occupants responding to a new design are influenced by several factors. Existing building performance models (BPMs) ignore or partially address those factors (called contextual factors) while developing BPMs. To potentially reduce the discrepancies and improve the prediction accuracy of BPMs, this paper proposes a computational framework for learning mixture models by using Generative Adversarial Networks (GANs) that appropriately combining existing BPMs with knowledge on occupant behaviors to contextual factors in new designs. Immersive virtual environments (IVEs) experiments are used to acquire data on such behaviors. Performance targets are used to guide appropriate combination of existing BPMs with knowledge on occupant behaviors. The resulting model obtained is called an augmented BPM. Two different experiments related to occupant lighting behaviors are shown as case study. The results reveal that augmented BPMs significantly outperformed existing BPMs with respect to achieving specified performance targets. The case study confirms the potential of the computational framework for improving prediction accuracy of BPMs during design.


Deep Reinforcement Learning for Cyber Security

The scale of Internet-connected systems has increased considerably, and these systems are being exposed to cyber attacks more than ever. The complexity and dynamics of cyber attacks require protecting mechanisms to be responsive, adaptive, and large-scale. Machine learning, or more specifically deep reinforcement learning (DRL), methods have been proposed widely to address these issues. By incorporating deep learning into traditional RL, DRL is highly capable of solving complex, dynamic, and especially high-dimensional cyber defense problems. This paper presents a survey of DRL approaches developed for cyber security. We touch on different vital aspects, including DRL-based security methods for cyber-physical systems, autonomous intrusion detection techniques, and multi-agent DRL-based game theory simulations for defense strategies against cyber attacks. Extensive discussions and future research directions on DRL-based cyber security are also given. We expect that this comprehensive review provides the foundations for and facilitates future studies on exploring the potential of emerging DRL to cope with increasingly complex cyber security problems.


Sub-policy Adaptation for Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning is a promising approach to long-horizon decision-making problems with sparse rewards. Unfortunately, most methods still decouple the lower-level skill acquisition process and the training of a higher level that controls the skills in a new task. Treating the skills as fixed can lead to significant sub-optimality in the transfer setting. In this work, we propose a novel algorithm to discover a set of skills, and continuously adapt them along with the higher level even when training on a new task. Our main contributions are two-fold. First, we derive a new hierarchical policy gradient, as well as an unbiased latent-dependent baseline. We introduce Hierarchical Proximal Policy Optimization (HiPPO), an on-policy method to efficiently train all levels of the hierarchy simultaneously. Second, we propose a method of training time-abstractions that improves the robustness of the obtained skills to environment changes. Code and results are available at sites.google.com/view/hippo-rl .


Contrastive Multiview Coding

Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, viewed by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a “dog’ can be seen, heard, and felt). We hypothesize that a powerful representation is one that models view-invariant factors. Based on this hypothesis, we investigate a contrastive coding scheme, in which a representation is learned that aims to maximize mutual information between different views but is otherwise compact. Our approach scales to any number of views, and is view-agnostic. The resulting learned representations perform above the state of the art for downstream tasks such as object classification, compared to formulations based on predictive learning or single view reconstruction, and improve as more views are added. Code and reference implementations are released on our project page: http://…/.


Reweighted Expectation Maximization

Training deep generative models with maximum likelihood remains a challenge. The typical workaround is to use variational inference (VI) and maximize a lower bound to the log marginal likelihood of the data. Variational auto-encoders (VAEs) adopt this approach. They further amortize the cost of inference by using a recognition network to parameterize the variational family. Amortized VI scales approximate posterior inference in deep generative models to large datasets. However it introduces an amortization gap and leads to approximate posteriors of reduced expressivity due to the problem known as posterior collapse. In this paper, we consider expectation maximization (EM) as a paradigm for fitting deep generative models. Unlike VI, EM directly maximizes the log marginal likelihood of the data. We rediscover the importance weighted auto-encoder (IWAE) as an instance of EM and propose a new EM-based algorithm for fitting deep generative models called reweighted expectation maximization (REM). REM learns better generative models than the IWAE by decoupling the learning dynamics of the generative model and the recognition network using a separate expressive proposal found by moment matching. We compared REM to the VAE and the IWAE on several density estimation benchmarks and found it leads to significantly better performance as measured by log-likelihood.


Semantics to Space(S2S): Embedding semantics into spatial space for zero-shot verb-object query inferencing

We present a novel deep zero-shot learning (ZSL) model for inferencing human-object-interaction with verb-object (VO) query. While the previous ZSL approaches only use the semantic/textual information to be fed into the query stream, we seek to incorporate and embed the semantics into the visual representation stream as well. Our approach is powered by Semantics-to-Space (S2S) architecture where semantics derived from the residing objects are embedded into a spatial space. This architecture allows the co-capturing of the semantic attributes of the human and the objects along with their location/size/silhouette information. As this is the first attempt to address the zero-shot human-object-interaction inferencing with VO query, we have constructed a new dataset, Verb-Transferability 60 (VT60). VT60 provides 60 different VO pairs with overlapping verbs tailored for testing ZSL approaches with VO query. Experimental evaluations show that our approach not only outperforms the state-of-the-art, but also shows the capability of consistently improving performance regardless of which ZSL baseline architecture is used.


Learning to Forget for Meta-Learning

Few-shot learning is a challenging problem where the system is required to achieve generalization from only few examples. Meta-learning tackles the problem by learning prior knowledge shared across a distribution of tasks, which is then used to quickly adapt to unseen tasks. Model-agnostic meta-learning (MAML) algorithm formulates prior knowledge as a common initialization across tasks. However, forcibly sharing an initialization brings about conflicts between tasks and thus compromises the quality of the initialization. In this work, by observing that the extent of compromise differs among tasks and between layers of a neural network, we propose a new initialization idea that employs task-dependent layer-wise attenuation, which we call selective forgetting. The proposed attenuation scheme dynamically controls how much of prior knowledge each layer will exploit for a given task. The experimental results demonstrate that the proposed method mitigates the conflicts and provides outstanding performance as a result. We further show that the proposed method, named L2F, can be applied and improve other state-of-the-art MAML-based frameworks, illustrating its generalizability.

Continue Reading…

Collapse

Read More

Magister Dixit

“The hype about the possibilities and possible applications of artificial intelligence (AI) seems currently unlimited. The AI procedures and solutions are praised as true panaceas. However, when viewed soberly, they are just another tool in the toolbox of IT experts.” Marc Botha ( 21.01.2019 18:07 )

Continue Reading…

Collapse

Read More

Degrees of Freedom Analysis of Unrolled Neural Networks

** Nuit Blanche is now on Twitter: @NuitBlog **

Studying the great convergence !



Unrolled neural networks emerged recently as an effective model for learning inverse maps appearing in image restoration tasks. However, their generalization risk (i.e., test mean-squared-error) and its link to network design and train sample size remains mysterious. Leveraging the Stein's Unbiased Risk Estimator (SURE), this paper analyzes the generalization risk with its bias and variance components for recurrent unrolled networks. We particularly investigate the degrees-of-freedom (DOF) component of SURE, trace of the end-to-end network Jacobian, to quantify the prediction variance. We prove that DOF is well-approximated by the weighted \textit{path sparsity} of the network under incoherence conditions on the trained weights. Empirically, we examine the SURE components as a function of train sample size for both recurrent and non-recurrent (with many more parameters) unrolled networks. Our key observations indicate that: 1) DOF increases with train sample size and converges to the generalization risk for both recurrent and non-recurrent schemes; 2) recurrent network converges significantly faster (with less train samples) compared with non-recurrent scheme, hence recurrence serves as a regularization for low sample size regimes.


Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Continue Reading…

Collapse

Read More

R Packages worth a look

Leaner Style Sheets (rless)
Converts LESS to CSS. It uses V8 engine, where LESS parser is run. Functions for LESS text, file or folder conversion are provided.

Orchestrate and Exchange Data with ‘EtherCalc’ Instances (ethercalc)
The ‘EtherCalc’ (https://ethercalc.net ) web application is a multi-user, collaborative spreadsheet t …

Dimensionality Assessment Using Minimum Rank Factor Analysis (EFA.MRFA)
Performs parallel analysis (Timmerman & Lorenzo-Seva, 2011 <doi:10.1037/a0023353>) and hull method (Lorenzo-Seva, Timmerman, & Kiers, 201 …

Finds the Archetypal Analysis of a Data Frame (archetypal)
Performs archetypal analysis by using Convex Hull approximation under a full control of all algorithmic parameters. It contains functions useful for fi …

Continue Reading…

Collapse

Read More

Document worth reading: “Evolutionary Algorithms”

Evolutionary algorithms (EAs) are population-based metaheuristics, originally inspired by aspects of natural evolution. Modern varieties incorporate a broad mixture of search mechanisms, and tend to blend inspiration from nature with pragmatic engineering concerns; however, all EAs essentially operate by maintaining a population of potential solutions and in some way artificially ‘evolving’ that population over time. Particularly well-known categories of EAs include genetic algorithms (GAs), Genetic Programming (GP), and Evolution Strategies (ES). EAs have proven very successful in practical applications, particularly those requiring solutions to combinatorial problems. EAs are highly flexible and can be configured to address any optimization task, without the requirements for reformulation and/or simplification that would be needed for other techniques. However, this flexibility goes hand in hand with a cost: the tailoring of an EA’s configuration and parameters, so as to provide robust performance for a given class of tasks, is often a complex and time-consuming process. This tailoring process is one of the many ongoing research areas associated with EAs. Evolutionary Algorithms

Continue Reading…

Collapse

Read More

Distilled News

Data Engineering – Basics of Apache Airflow – Build Your First Pipeline

Extracting Data from Multiple Data Sources. If you’re in tech, chances are you have data to manage. Data grows fast, gets more complex and harder to manage as your company scales. The management wants to extract insights from the data they have, but they do not have the technical skills to do that. They then hire you, the friendly neighbourhood data scientist/engineer/analyst or whatever title you want to call yourself nowadays ( does not matter you’ll be doing all the work anyways ) to do exactly just that. You soon realise that in order for you to provide insights, you need to have some kind of visualisation to explain your findings and monitor them over time. For these data to be up to date, you need to extract, transform, load them into your preferred database from multiple data sources in a fixed time interval ( hourly , daily, weekly, monthly). Companies have workflow management systems for tasks like these. They enable us to create and schedule jobs internally. Everyone has their own preference, but I would say many of the old WMS tend to be inefficient, hard to scale, lack of a good UI, no strategy for retry, not to mention terrible logs that makes troubleshooting / debugging a nightmare. It causes unnecessary wastage of effort and time. Here is an example of a traditional crontab used to schedule jobs.


Understanding different Loss Functions for Neural Networks

There are different loss functions available for different objectives. In this mini blog, I will take you through some of the very frequently used loss functions, with a set of examples. This blog is designed by keeping the Keras and Tensorflow framework in the mind.


How to build a custom Dataset for Tensorflow

Tensorflow inspires developers to experiment with their exciting AI ideas in almost any domain that comes to mind. There are three well known factors in the ML community that make up a good Deep Neural Network model do magical things.
• Model Architecture
• High quality training data
• Sufficient Compute Capacity


How Data Stories Help You Solve Analytics Challenges and Drive Impact – by Design

When we recently held a 24h hackathon aimed at helping an NGO fight human trafficking, the composition of the competing teams could not have been more different: There were a number of teams with a heavy background in analytics consulting. And then there was my team: One data engineer, and three designers. The goal of the event was to generate insights from data provided by the client. The game was on. The client brief was the nail. Each team had its hammer. It might have been hard to see what such a design heavy group had to do with what seemed like an obvious data analytics challenge, but we did not despair. Instead, the challenge allowed us to illustrate some valuable lessons about how combining data- and design-driven approaches can generate unique insights that previously had been overlooked. The key to getting there was in looking for the stories that came to the surface at the intersection of analytics and ethnography.


How Google Uses Reinforcement Learning to Train AI Agents in the Most Popular Sport in the World

Football( soccer for Americans) is by far the most popular sport in the world. With over 4 billion fans worldwide, football has proven to transcend generations, geo-political rivalries and even war conflicts. That passion has transferred into the video game space, in which games like FIFA regularly ranked among the most popular videogames worldwide. Despite its popularity, football is one of the games that have proven resilient artificial intelligence(AI) techniques. The complexity of environments like FIFA often result on a nightmare for AI algorithms. Recently, researchers from the Google Brain team open sourced Google Research Football, a new environment that leverages reinforcement learning to teach AI agents how to master the most popular sport in the world. The principles behind Google Research Football were outlined in a research paper that accompanied the release.


An Introduction to Bayesian Inference

Foreword: The following post is intended to be an introduction with some math. Although it includes math, I’m by no means an expert. I struggled to learn the basics of probabilistic programming, and hopefully this helps someone else make a little more sense out of the world. If you find any errors, please drop a comment and help us learn. Cheers! In data science we are often interested in understanding how data was generated, mainly because it allows us to answer questions about new and/or incomplete observations. Given a (often noisy) dataset though, there are an infinite number of possible model structures that could have generated the observed data. But not all model structures are made equally – that is, certain model structures are more realistic than others based on our assumptions of the world. It is up to the modeler to choose the one that best describes the world they are observing.


Correlation and Causation – How alcohol affects life expectancy

Correlation does not imply causation’. We hear this sentence over and over again. But what does that actually mean? This small analysis uncovers this topic with the help of R and simple regressions, focusing on how alcohol impacts health.


The Future of AI Is in Africa

In the last few years, the machine-learning community has blossomed, applying the technology to challenges like food security and health care.


Understanding Value Of Correlations In Data Science Projects

Every single successful data science project revolves around finding accurate correlations between the input and target variables. However more than often, we oversee how crucial correlation analysis is. It is recommended to perform correlation analysis before and after data gathering and transformation phases of a data science project. This article focuses on the important role correlations play in the data science projects and concentrates on the real world FinTech examples. Lastly it explains how we can model the correlations the right way.


Homemade Machine Learning

This repository contains examples of popular machine learning algorithms implemented in Python with mathematics behind them being explained. Each algorithm has interactive Jupyter Notebook demo that allows you to play with training data, algorithms configurations and immediately see the results, charts and predictions right in your browser. In most cases the explanations are based on this great machine learning course by Andrew Ng. The purpose of this repository is not to implement machine learning algorithms by using 3rd party library one-liners but rather to practice implementing these algorithms from scratch and get better understanding of the mathematics behind each algorithm. That’s why all algorithms implementations are called ‘homemade’ and not intended to be used for production.


Demystifying Tensorflow Time Series: Local Linear Trend

In the past couple of years, big companies hurried to publish their machine learning based time series libraries. For example, Facebook released Prophet, Amazon released Gluon Time Series, Microsoft released Time Series Insights and Google released Tensorflow time series. This popularity shows that machine learning based time series prediction is in high demand. This article introduces the recently released Tensorflow time series library from Google. This library uses probabilistic models to describe time series. In my 7 years’ experience in algorithmic trading, I’ve established the habit of analysing the inner workings of time series libraries. So I digged into the source code of this library to understand the Tensorflow team’s take on time series modelling.


Open-domain question answering with DeepPavlov

The ability to answer factoid questions is a key feature of any dialogue system. Formally speaking, to give an answer based on the document collection covering wide range of topics is called open-domain question answering (ODQA). The ODQA task combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer span from those articles). An ODQA system can be used in many applications. Chatbots apply ODQA to answer user requests, while the business-oriented Natural Language Processing (NLP) solutions leverage ODQA to answer questions based on internal corporate documentation. The picture below shows a typical dialogue with an ODQA system.


Applications of MCMC for Cryptography and Optimization

MCMC is a pretty hard topic to wrap your head around but examples do help a lot. Last time I wrote an article explaining MCMC methods intuitively. In that article, I showed how MCMC chains could be used to simulate from a random variable whose distribution is partially known i.e. we don’t know the normalizing constant. I also told how MCMC can be used to Solve problems with a large state space. But didn’t give an example. I will provide some real-world use cases in this post. If you don’t really appreciate MCMC till now, I hope I will be able to pique your interest by the end of this blog post. This post is about understanding MCMC Methods with the help of some Computer Science problems.


End-to-end learning, the (almost) every purpose ML method

End-to-end (E2E) learning refers to training a possibly complex learning system represented by a single model (specifically a Deep Neural Network) that represents the complete target system, bypassing the intermediate layers usually present in traditional pipeline designs.

Continue Reading…

Collapse

Read More

June 26, 2019

Blackman-Tukey Spectral Estimator in R

(This article was first published on Get Your Data On, and kindly contributed to R-bloggers)

Blackman-Tukey Spectral Estimator in R!

There are two definitions of the power spectral density (PSD). Both definitions are mathematically nearly identical and define a function that describes the distribution of power over the frequency components in our data set. The periodogram PSD estimator is based on the first definition of the PSD (see periodogram post). The Blackman-Tukey spectral estimator (BTSE) is based on the second definition. The second definition says, find the PSD by calculating the Fourier transform of the autocorrelation sequence (ACS). In R this definition is written as

PSD <- function(rxx) {
fft(rxx)
}

where fft is the R implementation of the fast Fourier transform, rxx is the autocorrelation sequence (ACS), the k’th element of the ACS rxx[k] = E[x[0]x[k]], k -infinity to +infinity, and E is the expectation operator. The xx in rxx[k] is a reminder r is a correlation between x and itself. The rxx[k]s are sometimes called lags. The ACS has the propriety that rxx[-k]=rxx[k]*, where * is the complex conjugate. In the post, we will only use real numbers and I’ll drop the * from here forward.

So, to find the PSD we just calculate rxx and take its fft! Unfortunately, in practice, we cannot do this. Calculating the expected value requires the probability density function (PDF) of x, which we don’t know and we need an infinite amount of data, which we don’t have. So, we can’t calculate the PSD: we’re doomed!

No, we are not doomed. We can’t calculate the PSD, but we estimate it! We can derive an estimator for the PSD from the definition of the PSD. First, we replace rxx with an estimate of rxx. We replace the expected value, which gives the exact rxx, with an average, which gives us an estimate of rxx. The E[x[0]x[k]] is replaced with (1/N)(x[1]x[1+k]+x[2]x[2+k]+…+x[N-1-k]x[N-1]+x[N-k]x[N]), where N is the number of data samples. For example; if k=0, then rxx[k]=(1/N)*sum(x*x). In R code the estimate is written as

lagEstimate <-function(x,k,N=length(x)){
(1/N)*sum(x[1:(N-k)]*x[(k+1):N])
}

If we had an infinite amount of data, N=infinity, we could use lagEstimate to estimate the entire infinite ACS. Unfortunately we don’t have an infinite amount of data, even if we did it wouldn’t fit into a computer. So, we can only estimate a finite amount of ASC elements. The function below calculates lags 0 to kMax.

Lags <-function(x,kMax) {
sapply(0:kMax,lagEstimate,x=x)
}

Before we can try these functions out we need data. In this case the data came from a random process with the PSD plotted in the figure below. The x-axis is normalized frequency(frequency divided by the sampling rate). So, if the sampling rate was 1000 Hz, you could multiply the normalized frequency by 1000 Hz and then the frequency axis would read 0 Hz to 1000 Hz. The y-axis in in dB (10log10(amplitude)). You can see six large sharp peaks in the plot and a gradual dip towards 0 Hz and then back up. Some of the peaks are close together and will be hard to resolve.

The data produced by the random process is plotted below. This is the data we will use through this post.

Let’s calculate the the ACS up to the 5th lag using the data.

Lags(x,kMax=5)
## [1]  6.095786 -1.368336  3.341608  1.738122 -1.737459  3.651765

A kMax of 5 gives us 6 lags: {r[0], r[1], r[2], r[3], r[4], r[5]}. These 6 lags are not an ACS, but are part of an ACS.

We used Lags to estimate the positive lags up to kMax, but the ACS is even sequence, r[-k]=r[k] for all k. So, let’s write a function to make a sequence consisting of lags from r[-kMax] to r[kMax]. This is a windowed ACS, values outside of the +/- kMax are replaced with 0. Where it won’t cause confusion, I’ll refer to the windowed ACS, as the ACS.

acsWindowed <- function(x,kMax,Nzero=0){
rHalf <- c(Lags(x,kMax),rep(0,Nzero))
c(rev(rHalf[2:length(rHalf)]),rHalf)
}

Let’s try this function out.

acsW <- acsWindowed(x,9)

In the figure below you can see the r[0] lag, the maximum, is plotted in the middle of the plot.

The ACS in the figure above is how the ACS is usually plotted in textbooks. In textbooks the sum in the Fourier transform ranges from -N/2 to (N-1)/2. So, the r[0] lag should be in the center of the plot. In R the sum in the Fourier transform ranges from 1 to N, so the 0’th lag has to be first. We could just make the sequence in R form, but it is often handy to start in textbook from and switch to R form. We can write a function to make switching from textbook to R easy.

Textbook2R <- function(x,N=length(x),foldN=ceiling(N/2)) {
c(x[foldN:N],x[1:(foldN-1)])
}

Notice in the figure below the maximum lag r[0], is plotted at the beginning.

Let’s imagine we have an infinite amount of data and used it to estimated and infinite number of ACS lags. Let’s call that sequence rAll. We make a windowed ACS by setting rW=rAll*W, where W=1 for our 9 lags and 0 everywhere else. W is called the rectangular window, because, as you can see in the plot below, it’s plot looks like a rectangle. By default when we estimate a finite number of lags we are using a rectangular window.

W <- c(rep(0,9),rep(1,9),rep(0,9))

The reason we can not use a rectangular window is its Fourier Transform is not always positive. As you can see in the plot below there are several values below zero, indicated with dotted line. Re() functions removes some small imaginary numbers due to numerical error, some imaginary dust we have to sweep up.

FFT_W <- Re(fft(Textbook2R(W)))

Even though the fft of the ACS rAll is positive , the produce rAll and a rectangular window might not be positive! The Bartlett window is a simple window whos fft is positive.

BartlettWindow <- function(N,n=seq(0, (N-1)))  {
1 - abs( (n-(N-1)/2) / ( (N-1)/2) )
}
Wb <- BartlettWindow(19)

As you can see in the plot below the Fourier transform of the Bartlett window is positive.

WbFft <- Re(fft(Textbook2R(Wb)))

Calculating the BTSE with R

Now that we can estimate the ACS and window our estimate, we are ready to estimate the PSD of our data. The BTSE is written as

Btse <- function(rHat,Wb) {
Re(fft(rHat*Wb))
}

Note the Re() is correcting for numerical error.

In the first example we use a 19 point ACS lag sequence.

rHat   <- Textbook2R(acsWindowed(x,kMax=9))
Wb <- Textbook2R(BartlettWindow(length(rHat)))
Pbtse9 <- Btse(rHat,Wb)

In the figure below is the BTSE calculated with a maximum lag of 9. The dotted lines indicate the locations of the peaks in the PSD we are trying to estimate. The estimate with a maximum lage of only 9 produces a poor estimate.

We calculate a new estimate with a maximum lag of 18.

rHat    <- Textbook2R(acsWindowed(x,kMax=18))
Wb <- Textbook2R(BartlettWindow(length(rHat)))
Pbtse18 <- Btse(rHat,Wb)

The next estimate is made with a maximum lag of 18. This estimate is better, the peaks around 0.4 and 0.6 are not resolved. We still need to increase the maximum lag.

Finally we increase the maximum lag to 65 and recalculate the estimate.

rHat    <- Textbook2R(acsWindowed(x,kMax=65))
Wb <- Textbook2R(BartlettWindow(length(rHat)))
Pbtse65 <- Btse(rHat,Wb)

This finial estimate is very good. All six peaks are resolved and the location of our estimated peaks are very close to the true peak locations locations.

Final Thoughts

Could we use 500 lags in the BTSE? In this case we could, since we have a lot of data, but the higher lags get estimated with less data and therefore have more variance. Using the high variance lags will produce a higher variance estimate.

Are there other ways to improve the BTSE other than using more lags? Yes! There are a few other ways. For instance, we could zero pad the lags. Basically add zeros to the end of our lag sequence. This will make the fft, in the BTSE estimator, evaluate the estimate at more frequencies and we will be able to see more details in the estimated PSD.

Also keep in mind there are other PSD estimation methods that do better on other PSD features. For instance, if you we more interested finding deep nulls rather than peaks, a moving average PSD estimation would be better.

To leave a comment for the author, please follow the link and comment on their blog: Get Your Data On.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Continue Reading…

Collapse

Read More

Book Memo: “The Experience-Centric Organization”

How to Win Through Customer Experience
Is your organization prepared for the next paradigm of customer experience? Or will you be left behind? This practical book will make you a winner in a market driven by experience and enable you to develop desirable offerings and standout service and attract loyal customers. Author Simon Clatworthy shows you how to transform into an organization that aligns your customers’ experiential journey with platforms, organizational structures, and strategic alliances. Rather than treat customer experience as an add-on to product and service design, you’ll discover how experience-centricity can drive the whole organization.

Continue Reading…

Collapse

Read More

Whats new on arXiv

Gap-Measure Tests with Applications to Data Integrity Verification

In this paper we propose and examine gap statistics for assessing uniform distribution hypotheses. We provide examples relevant to data integrity testing for which max-gap statistics provide greater sensitivity than chi-square (\chi^2), thus allowing the new test to be used in place of or as a complement to \chi^2 testing for purposes of distinguishing a larger class of deviations from uniformity. We establish that the proposed max-gap test has the same sequential and parallel computational complexity as \chi^2 and thus is applicable for Big Data analytics and integrity verification.

Learning Representations by Maximizing Mutual Information Across Views

We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context. For example, a context could be an image from ImageNet, and multiple views of the context could be generated by repeatedly applying data augmentation to the image. Following this approach, we develop a new model which maximizes mutual information between features extracted at multiple scales from independently-augmented copies of each input. Our model significantly outperforms prior work on the tasks we consider. Most notably, it achieves over 60% accuracy on ImageNet using the standard linear evaluation protocol. This improves on prior results by over 4% (absolute). On Places205, using the representations learned on ImageNet, our model achieves 50% accuracy. This improves on prior results by 2% (absolute). When we extend our model to use mixture-based representations, segmentation behaviour emerges as a natural side-effect.

Learning Interpretable Shapelets for Time Series Classification through Adversarial Regularization

Times series classification can be successfully tackled by jointly learning a shapelet-based representation of the series in the dataset and classifying the series according to this representation. However, although the learned shapelets are discriminative, they are not always similar to pieces of a real series in the dataset. This makes it difficult to interpret the decision, i.e. difficult to analyze if there are particular behaviors in a series that triggered the decision. In this paper, we make use of a simple convolutional network to tackle the time series classification task and we introduce an adversarial regularization to constrain the model to learn more interpretable shapelets. Our classification results on all the usual time series benchmarks are comparable with the results obtained by similar state-of-the-art algorithms but our adversarially regularized method learns shapelets that are, by design, interpretable.

Big-Data Clustering: K-Means or K-Indicators?

The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some ‘feature spaces’, as is in spectral clustering. Highly sensitive to initializations, however, K-means encounters a scalability bottleneck with respect to the number of clusters K as this number grows in big data applications. In this work, we promote a closely related model called K-indicators model and construct an efficient, semi-convex-relaxation algorithm that requires no randomized initializations. We present extensive empirical results to show advantages of the new algorithm when K is large. In particular, using the new algorithm to start the K-means algorithm, without any replication, can significantly outperform the standard K-means with a large number of currently state-of-the-art random replications.

Mining YouTube – A dataset for learning fine-grained action concepts from webly supervised video data

Action recognition is so far mainly focusing on the problem of classification of hand selected preclipped actions and reaching impressive results in this field. But with the performance even ceiling on current datasets, it also appears that the next steps in the field will have to go beyond this fully supervised classification. One way to overcome those problems is to move towards less restricted scenarios. In this context we present a large-scale real-world dataset designed to evaluate learning techniques for human action recognition beyond hand-crafted datasets. To this end we put the process of collecting data on its feet again and start with the annotation of a test set of 250 cooking videos. The training data is then gathered by searching for the respective annotated classes within the subtitles of freely available videos. The uniqueness of the dataset is attributed to the fact that the whole process of collecting the data and training does not involve any human intervention. To address the problem of semantic inconsistencies that arise with this kind of training data, we further propose a semantical hierarchical structure for the mined classes.

Terminal Brain Damage: Exposing the Graceless Degradation in Deep Neural Networks Under Hardware Fault Attacks

Deep neural networks (DNNs) have been shown to tolerate ‘brain damage’: cumulative changes to the network’s parameters (e.g., pruning, numerical perturbations) typically result in a graceful degradation of classification accuracy. However, the limits of this natural resilience are not well understood in the presence of small adversarial changes to the DNN parameters’ underlying memory representation, such as bit-flips that may be induced by hardware fault attacks. We study the effects of bitwise corruptions on 19 DNN models—six architectures on three image classification tasks—and we show that most models have at least one parameter that, after a specific bit-flip in their bitwise representation, causes an accuracy loss of over 90%. We employ simple heuristics to efficiently identify the parameters likely to be vulnerable. We estimate that 40-50% of the parameters in a model might lead to an accuracy drop greater than 10% when individually subjected to such single-bit perturbations. To demonstrate how an adversary could take advantage of this vulnerability, we study the impact of an exemplary hardware fault attack, Rowhammer, on DNNs. Specifically, we show that a Rowhammer enabled attacker co-located in the same physical machine can inflict significant accuracy drops (up to 99%) even with single bit-flip corruptions and no knowledge of the model. Our results expose the limits of DNNs’ resilience against parameter perturbations induced by real-world fault attacks. We conclude by discussing possible mitigations and future research directions towards fault attack-resilient DNNs.

NodeDrop: A Condition for Reducing Network Size without Effect on Output

Determining an appropriate number of features for each layer in a neural network is an important and difficult task. This task is especially important in applications on systems with limited memory or processing power. Many current approaches to reduce network size either utilize iterative procedures, which can extend training time significantly, or require very careful tuning of algorithm parameters to achieve reasonable results. In this paper we propose NodeDrop, a new method for eliminating features in a network. With NodeDrop, we define a condition to identify and guarantee which nodes carry no information, and then use regularization to encourage nodes to meet this condition. We find that NodeDrop drastically reduces the number of features in a network while maintaining high performance, reducing the number of parameters by a factor of 114x for a VGG like network on CIFAR10 without a drop in accuracy.

A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation

Action recognition has become a rapidly developing research field within the last decade. But with the increasing demand for large scale data, the need of hand annotated data for the training becomes more and more impractical. One way to avoid frame-based human annotation is the use of action order information to learn the respective action classes. In this context, we propose a hierarchical approach to address the problem of weakly supervised learning of human actions from ordered action labels by structuring recognition in a coarse-to-fine manner. Given a set of videos and an ordered list of the occurring actions, the task is to infer start and end frames of the related action classes within the video and to train the respective action classifiers without any need for hand labeled frame boundaries. We address this problem by combining a framewise RNN model with a coarse probabilistic inference. This combination allows for the temporal alignment of long sequences and thus, for an iterative training of both elements. While this system alone already generates good results, we show that the performance can be further improved by approximating the number of subactions to the characteristics of the different action classes as well as by the introduction of a regularizing length prior. The proposed system is evaluated on two benchmark datasets, the Breakfast and the Hollywood extended dataset, showing a competitive performance on various weak learning tasks such as temporal action segmentation and action alignment.

Correctness Verification of Neural Networks

We present the first verification that a neural network produces a correct output within a specified tolerance for every input of interest. We define correctness relative to a specification which identifies 1) a state space consisting of all relevant states of the world and 2) an observation process that produces neural network inputs from the states of the world. Tiling the state and input spaces with a finite number of tiles, obtaining ground truth bounds from the state tiles and network output bounds from the input tiles, then comparing the ground truth and network output bounds delivers an upper bound on the network output error for any input of interest. Results from a case study highlight the ability of our technique to deliver tight error bounds for all inputs of interest and show how the error bounds vary over the state and input spaces.

Transforming Complex Sentences into a Semantic Hierarchy

We present an approach for recursively splitting and rephrasing complex English sentences into a novel semantic hierarchy of simplified sentences, with each of them presenting a more regular structure that may facilitate a wide variety of artificial intelligence tasks, such as machine translation (MT) or information extraction (IE). Using a set of hand-crafted transformation rules, input sentences are recursively transformed into a two-layered hierarchical representation in the form of core sentences and accompanying contexts that are linked via rhetorical relations. In this way, the semantic relationship of the decomposed constituents is preserved in the output, maintaining its interpretability for downstream applications. Both a thorough manual analysis and automatic evaluation across three datasets from two different domains demonstrate that the proposed syntactic simplification approach outperforms the state of the art in structural text simplification. Moreover, an extrinsic evaluation shows that when applying our framework as a preprocessing step the performance of state-of-the-art Open IE systems can be improved by up to 346% in precision and 52% in recall. To enable reproducible research, all code is provided online.

Neural networks grown and self-organized by noise

Living neural networks emerge through a process of growth and self-organization that begins with a single cell and results in a brain, an organized and functional computational device. Artificial neural networks, however, rely on human-designed, hand-programmed architectures for their remarkable performance. Can we develop artificial computational devices that can grow and self-organize without human intervention? In this paper, we propose a biologically inspired developmental algorithm that can ‘grow’ a functional, layered neural network from a single initial cell. The algorithm organizes inter-layer connections to construct a convolutional pooling layer, a key constituent of convolutional neural networks (CNN’s). Our approach is inspired by the mechanisms employed by the early visual system to wire the retina to the lateral geniculate nucleus (LGN), days before animals open their eyes. The key ingredients for robust self-organization are an emergent spontaneous spatiotemporal activity wave in the first layer and a local learning rule in the second layer that ‘learns’ the underlying activity pattern in the first layer. The algorithm is adaptable to a wide-range of input-layer geometries, robust to malfunctioning units in the first layer, and so can be used to successfully grow and self-organize pooling architectures of different pool-sizes and shapes. The algorithm provides a primitive procedure for constructing layered neural networks through growth and self-organization. Broadly, our work shows that biologically inspired developmental algorithms can be applied to autonomously grow functional ‘brains’ in-silico.

Episodic Memory in Lifelong Language Learning

We introduce a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier. We propose an episodic memory model that performs sparse experience replay and local adaptation to mitigate catastrophic forgetting in this setup. Experiments on text classification and question answering demonstrate the complementary benefits of sparse experience replay and local adaptation to allow the model to continuously learn from new datasets. We also show that the space complexity of the episodic memory module can be reduced significantly (~50-90%) by randomly choosing which examples to store in memory with a minimal decrease in performance. We consider an episodic memory component as a crucial building block of general linguistic intelligence and see our model as a first step in that direction.

A Dyadic IRT Model

We propose a dyadic Item Response Theory (dIRT) model for measuring interactions of pairs of individuals when the responses to items represent the actions (or behaviors, perceptions, etc.) of each individual (actor) made within the context of a dyad formed with another individual (partner). Examples of its use include the assessment of collaborative problem solving, or the evaluation of intra-team dynamics. The dIRT model generalizes both Item Response Theory (IRT) models for measurement and the Social Relations Model (SRM) for dyadic data. The responses of an actor when paired with a partner are modeled as a function of not only the actor’s inclination to act and the partner’s tendency to elicit that action, but also the unique relationship of the pair, represented by two directional, possibly correlated, interaction latent variables. Generalizations are discussed, such as accommodating triads or larger groups. Estimation is performed using Markov-chain Monte Carlo implemented in Stan, making it straightforward to extend the dIRT model in various ways. Specifically, we show how the basic dIRT model can be extended to accommodate latent regressions, multilevel settings with cluster-level random effects, as well as joint modeling of dyadic data and a distal outcome. A simulation study demonstrates that estimation performs well. We apply our proposed approach to speed-dating data and find new evidence of pairwise interactions between participants, describing a mutual attraction that is inadequately characterized by individual properties alone.

MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning

Efficient approximation lies at the heart of large-scale machine learning problems. In this paper, we propose a novel, robust maximum entropy algorithm, which is capable of dealing with hundreds of moments and allows for computationally efficient approximations. We showcase the usefulness of the proposed method, its equivalence to constrained Bayesian variational inference and demonstrate its superiority over existing approaches in two applications, namely, fast log determinant estimation and information-theoretic Bayesian optimisation.

Random Path Selection for Incremental Learning

Incremental life-long learning is a main challenge towards the long-standing goal of Artificial General Intelligence. In real-life settings, learning tasks arrive in a sequence and machine learning models must continually learn to increment already acquired knowledge. Existing incremental learning approaches, fall well below the state-of-the-art cumulative models that use all training classes at once. In this paper, we propose a random path selection algorithm, called RPSnet, that progressively chooses optimal paths for the new tasks while encouraging parameter sharing and reuse. Our approach avoids the overhead introduced by computationally expensive evolutionary and reinforcement learning based path selection strategies while achieving considerable performance gains. As an added novelty, the proposed model integrates knowledge distillation and retrospection along with the path selection strategy to overcome catastrophic forgetting. In order to maintain an equilibrium between previous and newly acquired knowledge, we propose a simple controller to dynamically balance the model plasticity. Through extensive experiments, we demonstrate that the proposed method surpasses the state-of-the-art performance on incremental learning and by utilizing parallel computation this method can run in constant time with nearly the same efficiency as a conventional deep convolutional neural network.

A Case for Backward Compatibility for Human-AI Teams

AI systems are being deployed to support human decision making in high-stakes domains. In many cases, the human and AI form a team, in which the human makes decisions after reviewing the AI’s inferences. A successful partnership requires that the human develops insights into the performance of the AI system, including its failures. We study the influence of updates to an AI system in this setting. While updates can increase the AI’s predictive performance, they may also lead to changes that are at odds with the user’s prior experiences and confidence in the AI’s inferences, hurting therefore the overall team performance. We introduce the notion of the compatibility of an AI update with prior user experience and present methods for studying the role of compatibility in human-AI teams. Empirical results on three high-stakes domains show that current machine learning algorithms do not produce compatible updates. We propose a re-training objective to improve the compatibility of an update by penalizing new errors. The objective offers full leverage of the performance/compatibility tradeoff, enabling more compatible yet accurate updates.

Towards Fair and Decentralized Privacy-Preserving Deep Learning with Blockchain

In collaborative deep learning, current learning frameworks follow either a centralized architecture or a distributed architecture. Whilst centralized architecture deploys a central server to train a global model over the massive amount of joint data from all parties, distributed architecture aggregates parameter updates from participating parties’ local model training, via a parameter server. These two server-based architectures present security and robustness vulnerabilities such as single-point-of-failure, single-point-of-breach, privacy leakage, and lack of fairness. To address these problems, we design, implement, and evaluate a purely decentralized privacy-preserving deep learning framework, called DPPDL. DPPDL makes the first investigation on the research problem of fairness in collaborative deep learning, and simultaneously provides fairness and privacy by proposing two novel algorithms: initial benchmarking and privacy-preserving collaborative deep learning. During initial benchmarking, each party trains a local Differentially Private Generative Adversarial Network (DPGAN) and publishes the generated privacy-preserving artificial samples for other parties to label, based on the quality of which to initialize local credibility list for other parties. The local credibility list reflects how much one party contributes to another party, and it is used and updated during collaborative learning to ensure fairness. To protect gradients transaction during privacy-preserving collaborative deep learning, we further put forward a three-layer onion-style encryption scheme. We experimentally demonstrate, on benchmark image datasets, that accuracy, privacy and fairness in collaborative deep learning can be effectively addressed at the same time by our proposed DPPDL framework. Moreover, DPPDL provides a viable solution to detect and isolate the cheating party in the system.

Back Attention Knowledge Transfer for Low-resource Named Entity Recognition

In recent years, great success has been achieved in the field of natural language processing (NLP), thanks in part to the considerable amount of annotated resources. For named entity recognition (NER), most languages do not have such an abundance of labeled data, so the performances of those languages are comparatively lower. To improve the performance, we propose a general approach called Back Attention Network (BAN). BAN uses translation system to translate other language sentences into English and utilizes the pre-trained English NER model to get task-specific information. After that, BAN applies a new mechanism named back attention knowledge transfer to improve the semantic representation, which aids in generation of the result. Experiments on three different language datasets indicate that our approach outperforms other state-of-the-art methods.

Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity’s neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.

Robust Mean Estimation with the Bayesian Median of Means

The sample mean is often used to aggregate different unbiased estimates of a parameter, producing a final estimate that is unbiased but possibly high-variance. This paper introduces the Bayesian median of means, an aggregation rule that roughly interpolates between the sample mean and median, resulting in estimates with much smaller variance at the expense of bias. While the procedure is non-parametric, its squared bias is asymptotically negligible relative to the variance, similar to maximum likelihood estimators. The Bayesian median of means is consistent, and concentration bounds for the estimator’s bias and L_1 error are derived, as well as a fast non-randomized approximating algorithm. The performances of both the exact and the approximate procedures match that of the sample mean in low-variance settings, and exhibit much better results in high-variance scenarios. The empirical performances are examined in real and simulated data, and in applications such as importance sampling, cross-validation and bagging.

Attributed Graph Clustering via Adaptive Graph Convolution

Attributed graph clustering is challenging as it requires joint modelling of graph structures and node attributes. Recent progress on graph convolutional networks has proved that graph convolution is effective in combining structural and content information, and several recent methods based on it have achieved promising clustering performance on some real attributed networks. However, there is limited understanding of how graph convolution affects clustering performance and how to properly use it to optimize performance for different graphs. Existing methods essentially use graph convolution of a fixed and low order that only takes into account neighbours within a few hops of each node, which underutilizes node relations and ignores the diversity of graphs. In this paper, we propose an adaptive graph convolution method for attributed graph clustering that exploits high-order graph convolution to capture global cluster structure and adaptively selects the appropriate order for different graphs. We establish the validity of our method by theoretical analysis and extensive experiments on benchmark datasets. Empirical results show that our method compares favourably with state-of-the-art methods.

Progressive Self-Supervised Attention Learning for Aspect-Level Sentiment Analysis

In aspect-level sentiment classification (ASC), it is prevalent to equip dominant neural models with attention mechanisms, for the sake of acquiring the importance of each context word on the given aspect. However, such a mechanism tends to excessively focus on a few frequent words with sentiment polarities, while ignoring infrequent ones. In this paper, we propose a progressive self-supervised attention learning approach for neural ASC models, which automatically mines useful attention supervision information from a training corpus to refine attention mechanisms. Specifically, we iteratively conduct sentiment predictions on all training instances. Particularly, at each iteration, the context word with the maximum attention weight is extracted as the one with active/misleading influence on the correct/incorrect prediction of every instance, and then the word itself is masked for subsequent iterations. Finally, we augment the conventional training objective with a regularization term, which enables ASC models to continue equally focusing on the extracted active context words while decreasing weights of those misleading ones. Experimental results on multiple datasets show that our proposed approach yields better attention mechanisms, leading to substantial improvements over the two state-of-the-art neural ASC models. Source code and trained models are available at https://…/PSSAttention.

Toward Building Conversational Recommender Systems: A Contextual Bandit Approach

Contextual bandit algorithms have gained increasing popularity in recommender systems, because they can learn to adapt recommendations by making exploration-exploitation trade-off. Recommender systems equipped with traditional contextual bandit algorithms are usually trained with behavioral feedback (e.g., clicks) from users on items. The learning speed can be slow because behavioral feedback by nature does not carry sufficient information. As a result, extensive exploration has to be performed. To address the problem, we propose conversational recommendation in which the system occasionally asks questions to the user about her interest. We first generalize contextual bandit to leverage not only behavioral feedback (arm-level feedback), but also verbal feedback (users’ interest on categories, topics, etc.). We then propose a new UCB- based algorithm, and theoretically prove that the new algorithm can indeed reduce the amount of exploration in learning. We also design several strategies for asking questions to further optimize the speed of learning. Experiments on synthetic data, Yelp data, and news recommendation data from Toutiao demonstrate the efficacy of the proposed algorithm.

An Efficient Graph Convolutional Network Technique for the Travelling Salesman Problem

This paper introduces a new learning-based approach for approximately solving the Travelling Salesman Problem on 2D Euclidean graphs. We use deep Graph Convolutional Networks to build efficient TSP graph representations and output tours in a non-autoregressive manner via highly parallelized beam search. Our approach outperforms all recently proposed autoregressive deep learning techniques in terms of solution quality, inference speed and sample efficiency for problem instances of fixed graph sizes. In particular, we reduce the average optimality gap from 0.52% to 0.01% for 50 nodes, and from 2.26% to 1.39% for 100 nodes. Finally, despite improving upon other learning-based approaches for TSP, our approach falls short of standard Operations Research solvers.

An interpretable machine learning framework for modelling human decision behavior

Machine learning has recently been widely adopted to address the managerial decision making problems. However, there is a trade-off between performance and interpretability. Full complexity models (such as neural network-based models) are non-traceable black-box, whereas classic interpretable models (such as logistic regression) are usually simplified with lower accuracy. This trade-off limits the application of state-of-the-art machine learning models in management problems, which requires high prediction performance, as well as the understanding of individual attributes’ contributions to the model outcome. Multiple criteria decision aiding (MCDA) is a family of interpretable approaches to depicting the rationale of human decision behavior. It is also limited by strong assumptions (e.g. preference independence). In this paper, we propose an interpretable machine learning approach, namely Neural Network-based Multiple Criteria Decision Aiding (NN-MCDA), which combines an additive MCDA model and a fully-connected multilayer perceptron (MLP) to achieve good performance while preserving a certain degree of interpretability. NN-MCDA has a linear component (in an additive form of a set of polynomial functions) to capture the detailed relationship between individual attributes and the prediction, and a nonlinear component (in a standard MLP form) to capture the high-order interactions between attributes and their complex nonlinear transformations. We demonstrate the effectiveness of NN-MCDA with extensive simulation studies and two real-world datasets. To the best of our knowledge, this research is the first to enhance the interpretability of machine learning models with MCDA techniques. The proposed framework also sheds light on how to use machine learning techniques to free MCDA from strong assumptions.

Universal Boosting Variational Inference

Boosting variational inference (BVI) approximates an intractable probability density by iteratively building up a mixture of simple component distributions one at a time, using techniques from sparse convex optimization to provide both computational scalability and approximation error guarantees. But the guarantees have strong conditions that do not often hold in practice, resulting in degenerate component optimization problems; and we show that the ad-hoc regularization used to prevent degeneracy in practice can cause BVI to fail in unintuitive ways. We thus develop universal boosting variational inference (UBVI), a BVI scheme that exploits the simple geometry of probability densities under the Hellinger metric to prevent the degeneracy of other gradient-based BVI methods, avoid difficult joint optimizations of both component and weight, and simplify fully-corrective weight optimizations. We show that for any target density and any mixture component family, the output of UBVI converges to the best possible approximation in the mixture family, even when the mixture family is misspecified. We develop a scalable implementation based on exponential family mixture components and standard stochastic optimization techniques. Finally, we discuss statistical benefits of the Hellinger distance as a variational objective through bounds on posterior probability, moment, and importance sampling errors. Experiments on multiple datasets and models show that UBVI provides reliable, accurate posterior approximations.

Kinetic Market Model: An Evolutionary Algorithm

This research proposes the econophysics kinetic market model as an evolutionary algorithm’s instance. The immediate results from this proposal is a new replacement rule for family competition genetic algorithms. It also represents a starting point to adding evolvable entities to kinetic market models.

A Novel Hyperparameter-free Approach to Decision Tree Construction that Avoids Overfitting by Design

Decision trees are an extremely popular machine learning technique. Unfortunately, overfitting in decision trees still remains an open issue that sometimes prevents achieving good performance. In this work, we present a novel approach for the construction of decision trees that avoids the overfitting by design, without losing accuracy. A distinctive feature of our algorithm is that it requires neither the optimization of any hyperparameters, nor the use of regularization techniques, thus significantly reducing the decision tree training time. Moreover, our algorithm produces much smaller and shallower trees than traditional algorithms, facilitating the interpretability of the resulting models.

The Extended Dawid-Skene Model: Fusing Information from Multiple Data Schemas

While label fusion from multiple noisy annotations is a well understood concept in data wrangling (tackled for example by the Dawid-Skene (DS) model), we consider the extended problem of carrying out learning when the labels themselves are not consistently annotated with the same schema. We show that even if annotators use disparate, albeit related, label-sets, we can still draw inferences for the underlying full label-set. We propose the Inter-Schema AdapteR (ISAR) to translate the fully-specified label-set to the one used by each annotator, enabling learning under such heterogeneous schemas, without the need to re-annotate the data. We apply our method to a mouse behavioural dataset, achieving significant gains (compared with DS) in out-of-sample log-likelihood (-3.40 to -2.39) and F1-score (0.785 to 0.864).

Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts

Emotion cause extraction (ECE), the task aimed at extracting the potential causes behind certain emotions in text, has gained much attention in recent years due to its wide applications. However, it suffers from two shortcomings: 1) the emotion must be annotated before cause extraction in ECE, which greatly limits its applications in real-world scenarios; 2) the way to first annotate emotion and then extract the cause ignores the fact that they are mutually indicative. In this work, we propose a new task: emotion-cause pair extraction (ECPE), which aims to extract the potential pairs of emotions and corresponding causes in a document. We propose a 2-step approach to address this new ECPE task, which first performs individual emotion extraction and cause extraction via multi-task learning, and then conduct emotion-cause pairing and filtering. The experimental results on a benchmark emotion cause corpus prove the feasibility of the ECPE task as well as the effectiveness of our approach.

Privacy-preserving Crowd-guided AI Decision-making in Ethical Dilemmas

With the rapid development of artificial intelligence (AI), ethical issues surrounding AI have attracted increasing attention. In particular, autonomous vehicles may face moral dilemmas in accident scenarios, such as staying the course resulting in hurting pedestrians or swerving leading to hurting passengers. To investigate such ethical dilemmas, recent studies have adopted preference aggregation, in which each voter expresses her/his preferences over decisions for the possible ethical dilemma scenarios, and a centralized system aggregates these preferences to obtain the winning decision. Although a useful methodology for building ethical AI systems, such an approach can potentially violate the privacy of voters since moral preferences are sensitive information and their disclosure can be exploited by malicious parties. In this paper, we report a first-of-its-kind privacy-preserving crowd-guided AI decision-making approach in ethical dilemmas. We adopt the notion of differential privacy to quantify privacy and consider four granularities of privacy protection by taking voter-/record-level privacy protection and centralized/distributed perturbation into account, resulting in four approaches VLCP, RLCP, VLDP, and RLDP. Moreover, we propose different algorithms to achieve these privacy protection granularities, while retaining the accuracy of the learned moral preference model. Specifically, VLCP and RLCP are implemented with the data aggregator setting a universal privacy parameter and perturbing the averaged moral preference to protect the privacy of voters’ data. VLDP and RLDP are implemented in such a way that each voter perturbs her/his local moral preference with a personalized privacy parameter. Extensive experiments on both synthetic and real data demonstrate that the proposed approach can achieve high accuracy of preference aggregation while protecting individual voter’s privacy.

The Secrets of Machine Learning: Ten Things You Wish You Had Known Earlier to be More Effective at Data Analysis

Despite the widespread usage of machine learning throughout organizations, there are some key principles that are commonly missed. In particular: 1) There are at least four main families for supervised learning: logical modeling methods, linear combination methods, case-based reasoning methods, and iterative summarization methods. 2) For many application domains, almost all machine learning methods perform similarly (with some caveats). Deep learning methods, which are the leading technique for computer vision problems, do not maintain an edge over other methods for most problems (and there are reasons why). 3) Neural networks are hard to train and weird stuff often happens when you try to train them. 4) If you don’t use an interpretable model, you can make bad mistakes. 5) Explanations can be misleading and you can’t trust them. 6) You can pretty much always find an accurate-yet-interpretable model, even for deep neural networks. 7) Special properties such as decision making or robustness must be built in, they don’t happen on their own. 8) Causal inference is different than prediction (correlation is not causation). 9) There is a method to the madness of deep neural architectures, but not always. 10) It is a myth that artificial intelligence can do anything.

CCMI : Classifier based Conditional Mutual Information Estimation

Conditional Mutual Information (CMI) is a measure of conditional dependence between random variables X and Y, given another random variable Z. It can be used to quantify conditional dependence among variables in many data-driven inference problems such as graphical models, causal learning, feature selection and time-series analysis. While k-nearest neighbor (kNN) based estimators as well as kernel-based methods have been widely used for CMI estimation, they suffer severely from the curse of dimensionality. In this paper, we leverage advances in classifiers and generative models to design methods for CMI estimation. Specifically, we introduce an estimator for KL-Divergence based on the likelihood ratio by training a classifier to distinguish the observed joint distribution from the product distribution. We then show how to construct several CMI estimators using this basic divergence estimator by drawing ideas from conditional generative models. We demonstrate that the estimates from our proposed approaches do not degrade in performance with increasing dimension and obtain significant improvement over the widely used KSG estimator. Finally, as an application of accurate CMI estimation, we use our best estimator for conditional independence testing and achieve superior performance than the state-of-the-art tester on both simulated and real data-sets.

Continue Reading…

Collapse

Read More

Document worth reading: “Big Data Meet Cyber-Physical Systems: A Panoramic Survey”

The world is witnessing an unprecedented growth of cyber-physical systems (CPS), which are foreseen to revolutionize our world {via} creating new services and applications in a variety of sectors such as environmental monitoring, mobile-health systems, intelligent transportation systems and so on. The {information and communication technology }(ICT) sector is experiencing a significant growth in { data} traffic, driven by the widespread usage of smartphones, tablets and video streaming, along with the significant growth of sensors deployments that are anticipated in the near future. {It} is expected to outstandingly increase the growth rate of raw sensed data. In this paper, we present the CPS taxonomy {via} providing a broad overview of data collection, storage, access, processing and analysis. Compared with other survey papers, this is the first panoramic survey on big data for CPS, where our objective is to provide a panoramic summary of different CPS aspects. Furthermore, CPS {require} cybersecurity to protect {them} against malicious attacks and unauthorized intrusion, which {become} a challenge with the enormous amount of data that is continuously being generated in the network. {Thus, we also} provide an overview of the different security solutions proposed for CPS big data storage, access and analytics. We also discuss big data meeting green challenges in the contexts of CPS. Big Data Meet Cyber-Physical Systems: A Panoramic Survey

Continue Reading…

Collapse

Read More

Whats new on arXiv – Complete List

There is no general AI: Why Turing machines cannot pass the Turing test
Sionnx: Automatic Unit Test Generator for ONNX Conformance
Boosting Few-Shot Visual Learning with Self-Supervision
Task Agnostic Continual Learning via Meta Learning
Warping Resilient Time Series Embeddings
Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension
Functional Singular Spectrum Analysis
Better Code, Better Sharing:On the Need of Analyzing Jupyter Notebooks
Representation Learning for Words and Entities
GluonTS: Probabilistic Time Series Models in Python
Learning Curves for Deep Neural Networks: A Gaussian Field Theory Perspective
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
Pairwise Fairness for Ranking and Regression
Tensor Canonical Correlation Analysis
Dynamic Time Scan Forecasting
Linear Distillation Learning
Random Tessellation Forests
A Brief Introduction to Manifold Optimization
Factorized Mutual Information Maximization
Reinforcement Learning of Spatio-Temporal Point Processes
EKT: Exercise-aware Knowledge Tracing for Student Performance Prediction
Automatically Evaluating Balance: A Machine Learning Approach
Support Vector Machine-Based Fire Outbreak Detection System
Tackling Climate Change with Machine Learning
Deep Two-path Semi-supervised Learning for Fake News Detection
Generating Long and Informative Reviews with Aspect-Aware Coarse-to-Fine Decoding
Traffic signal control optimization under severe incident conditions using Genetic Algorithm
A Focus on Neural Machine Translation for African Languages
Calibration, Entropy Rates, and Memory in Language Models
Towards Resilient UAV: Escape Time in GPS Denied Environment with Sensor Drift
Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition
Deep Learning based Emotion Recognition System Using Speech Features and Transcriptions
Cued@wmt19:ewc&lms
Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation
Temporally-Biased Sampling Schemes for Online Model Management
The NMF problem and lattice-subspaces
Understanding artificial intelligence ethics and safety
Privacy-Preserving Deep Visual Recognition: An Adversarial Learning Framework and A New Dataset
The complexity of the vertex-minor problem
Hysteresis, neural avalanches and critical behaviour near a first-order transition of a spiking neural network
Parameterized Structured Pruning for Deep Neural Networks
Detection and Correction of Cardiac MR Motion Artefacts during Reconstruction from K-space
Model Order Reduction by Proper Orthogonal Decomposition
Global optimization using Sobol indices
Vispi: Automatic Visual Perception and Interpretation of Chest X-rays
On the joint distribution of cyclic valleys and excedances over conjugacy classes of $\mathfrak{S}_{n}$
Voronoi conjecture for five-dimensional parallelohedra
Tackling Partial Domain Adaptation with Self-Supervision
Manifold Graph with Learned Prototypes for Semi-Supervised Image Classification
Towards Real-Time Head Pose Estimation: Exploring Parameter-Reduced Residual Networks on In-the-wild Datasets
Model-Free Practical Cooperative Control for Diffusively Coupled Systems
Sorted Top-k in Rounds
Unsupervised Monocular Depth and Ego-motion Learning with Structure and Semantics
Is Deep Learning an RG Flow?
A Multiscale Visualization of Attention in the Transformer Model
Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
Choosing agile or plan-driven enterprise resource planning (ERP) implementations — A study on 21 implementations from 20 companies
A Model to Search for Synthesizable Molecules
Estimation of the Shapley value by ergodic sampling
Continual and Multi-Task Architecture Search
Flying far and fast: the distribution of distant hypervelocity star candidates from Gaia DR2 data
Handwritten Text Segmentation via End-to-End Learning of Convolutional Neural Network
Nonparametric Identification and Estimation with Independent, Discrete Instruments
Reinforcement Knowledge Graph Reasoning for Explainable Recommendation
Understanding Vulnerability of Communities in Complex Networks
When to use parametric models in reinforcement learning?
A Bayesian Hierarchical Model for Evaluating Forensic Footwear Evidence
Tensor train optimization for mathematical model of social networks
Bootstrapping Upper Confidence Bound
Multitask Learning for Network Traffic Classification
Search on the Replay Buffer: Bridging Planning and Reinforcement Learning
A Simple Text Mining Approach for Ranking Pairwise Associations in Biomedical Applications
Higher extensions for gentle algebras
Image-Adaptive GAN based Reconstruction
LAEO-Net: revisiting people Looking At Each Other in videos
Multicolor Ramsey numbers of cycles in Gallai colorings
The Tandem Duplication Distance is NP-hard
Visual Wake Words Dataset
Developing an improved Crystal Graph Convolutional Neural Network framework for accelerated materials discovery
Differential Imaging Forensics
Artificial Intelligence Enabled Material Behavior Prediction
Does Learning Require Memorization? A Short Tale about a Long Tail
Presence-Only Geographical Priors for Fine-Grained Image Classification
Critical Point Finding with Newton-MR by Analogy to Computing Square Roots
Efficient Exploration via State Marginal Matching
Keeping Notes: Conditional Natural Language Generation with a Scratchpad Mechanism
Matrix Mittag–Leffler distributions and modeling heavy-tailed risks
MOPED: Efficient priors for scalable variational inference in Bayesian deep neural networks
Neural Network Models for Stock Selection Based on Fundamental Analysis
Equality and difference of quenched and averaged large deviation rate functions for random walks in random environments without ballisticity
Sub-Goal Trees — a Framework for Goal-Directed Trajectory Prediction and Optimization
HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-scale Point Clouds
Identifying and Predicting Parkinson’s Disease Subtypes through Trajectory Clustering via Bipartite Networks
The effect of dead time on randomly sampled power spectral estimates
Reduction of noise and bias in randomly sampled power spectra
Optimizing Redundancy Levels in Master-Worker Compute Clusters for Straggler Mitigation
Optimal low rank tensor recovery
Permutation-based uncertainty quantification about a mixing distribution
Data Conversion in Area-Constrained Applications: the Wireless Network-on-Chip Case
Uncovering Dominant Social Class in Neighborhoods through Building Footprints: A Case Study of Residential Zones in Massachusetts using Computer Vision
Conditional Monte Carlo for Reaction Networks
GANPOP: Generative Adversarial Network Prediction of Optical Properties from Single Snapshot Wide-field Images
Opportunistic Beamforming in Wireless Network-on-Chip
Competing Bandits in Matching Markets
Work Design and Job Rotation in Software Engineering: Results from an Industrial Study
Linking geospatial data with Geo-L — analysis and experiments of big data readiness of common technologies
A steady-state stability analysis of uniform synchronous power grid topologies
Brouwer’s conjecture holds asymptotically almost surely
Modeling functional resting-state brain networks through neural message passing on the human connectome
Neural Graph Evolution: Towards Efficient Automatic Robot Design
The Herbarium Challenge 2019 Dataset
E3: Entailment-driven Extracting and Editing for Conversational Machine Reading
Meta-Learning via Learned Loss
Eye Contact Correction using Deep Neural Networks
Compositional generalization through meta sequence-to-sequence learning
Loop Programming Practices that Simplify Quicksort Implementations
Assisted Excitation of Activations: A Learning Technique to Improve Object Detectors
Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian
Neural Arabic Question Answering
Nonintrusive proper generalised decomposition for parametrised incompressible flow problems in OpenFOAM
Topology-Preserving Deep Image Segmentation
Analyzing the Limitations of Cross-lingual Word Embedding Mappings
A Countrywide Traffic Accident Dataset
A Joint Graph Based Coding Scheme for the Unsourced Random Access Gaussian Channel
Flexible Modeling of Diversity with Strongly Log-Concave Distributions
Fast, reliable and unrestricted iterative computation of Gauss–Hermite and Gauss–Laguerre quadratures
Synthetic QA Corpora Generation with Roundtrip Consistency
Efficient Evaluation-Time Uncertainty Estimation by Improved Distillation
From asymptotic properties of general point processes to the ranking of financial agents
Memory Augmented Neural Network Adaptive Controller for Strict Feedback Nonlinear Systems
Lower Bounds for the Happy Coloring Problems
Copulas as High-Dimensional Generative Models: Vine Copula Autoencoders
Folding Bilateral Backstepping Output-Feedback Control Design For an Unstable Parabolic PDE
Jacobian Policy Optimizations
CoopSubNet: Cooperating Subnetwork for Data-Driven Regularization of Deep Networks under Limited Training Budgets
Factors for the Generalisation of Identity Relations by Neural Networks
N-dimensional Heisenberg’s uncertainty principle for fractional Fourier transform
Coordinated Path Following Control of Fixed-wing Unmanned Aerial Vehicles
Money Cannot Buy Everything: Trading Infinite Location Data Streams with Bounded Individual Privacy Loss
Fixed-Parameter Tractability of Graph Deletion Problems over Data Streams
Efficiency of maximum likelihood estimation for a multinomial distribution with known probability sums
Near-Optimal Glimpse Sequences for Improved Hard Attention Neural Network Training
Combinatorially equivalent hyperplane arrangements
Figurative Usage Detection of Symptom Words to Improve Personal Health Mention Detection
A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics
On Feasibility and Flexibility Operating Regions of Virtual Power Plants and TSO/DSO interfaces
Selective prediction-set models with coverage guarantees
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets

Continue Reading…

Collapse

Read More

Whats new on arXiv

There is no general AI: Why Turing machines cannot pass the Turing test

Since 1950, when Alan Turing proposed what has since come to be called the Turing test, the ability of a machine to pass this test has established itself as the primary hallmark of general AI. To pass the test, a machine would have to be able to engage in dialogue in such a way that a human interrogator could not distinguish its behaviour from that of a human being. AI researchers have attempted to build machines that could meet this requirement, but they have so far failed. To pass the test, a machine would have to meet two conditions: (i) react appropriately to the variance in human dialogue and (ii) display a human-like personality and intentions. We argue, first, that it is for mathematical reasons impossible to program a machine which can master the enormously complex and constantly evolving pattern of variance which human dialogues contain. And second, that we do not know how to make machines that possess personality and intentions of the sort we find in humans. Since a Turing machine cannot master human dialogue behaviour, we conclude that a Turing machine also cannot possess what is called “general” Artificial Intelligence. We do, however, acknowledge the potential of Turing machines to master dialogue behaviour in highly restricted contexts, where what is called “narrow” AI can still be of considerable utility.


Sionnx: Automatic Unit Test Generator for ONNX Conformance

Open Neural Network Exchange (ONNX) is an open format to represent AI models and is supported by many machine learning frameworks. While ONNX defines unified and portable computation operators across various frameworks, the conformance tests for those operators are insufficient, which makes it difficult to verify if an operator’s behavior in an ONNX backend implementation complies with the ONNX standard. In this paper, we present the first automatic unit test generator named Sionnx for verifying the compliance of ONNX implementation. First, we propose a compact yet complete set of rules to describe the operator’s attributes and the properties of its operands. Second, we design an Operator Specification Language (OSL) to provide a high-level description for the operator’s syntax. Finally, through this easy-to-use specification language, we are able to build a full testing specification which leverages LLVM TableGen to automatically generate unit tests for ONNX operators with much large coverage. Sionnx is lightweight and flexible to support cross-framework verification. The Sionnx framework is open-sourced in the github repository (https://…/Sionnx ).


Boosting Few-Shot Visual Learning with Self-Supervision

Few-shot learning and self-supervised learning address different facets of the same problem: how to train a model with little or no labeled data. Few-shot learning aims for optimization methods and models that can learn efficiently to recognize patterns in the low data regime. Self-supervised learning focuses instead on unlabeled data and looks into it for the supervisory signal to feed high capacity deep neural networks. In this work we exploit the complementarity of these two domains and propose an approach for improving few-shot learning through self-supervision. We use self-supervision as an auxiliary task in a few-shot learning pipeline, enabling feature extractors to learn richer and more transferable visual representations while still using few annotated samples. Through self-supervision, our approach can be naturally extended towards using diverse unlabeled data from other datasets in the few-shot setting. We report consistent improvements across an array of architectures, datasets and self-supervision techniques.


Task Agnostic Continual Learning via Meta Learning

While neural networks are powerful function approximators, they suffer from catastrophic forgetting when the data distribution is not stationary. One particular formalism that studies learning under non-stationary distribution is provided by continual learning, where the non-stationarity is imposed by a sequence of distinct tasks. Most methods in this space assume, however, the knowledge of task boundaries, and focus on alleviating catastrophic forgetting. In this work, we depart from this view and move the focus towards faster remembering — i.e measuring how quickly the network recovers performance rather than measuring the network’s performance without any adaptation. We argue that in many settings this can be more effective and that it opens the door to combining meta-learning and continual learning techniques, leveraging their complementary advantages. We propose a framework specific for the scenario where no information about task boundaries or task identity is given. It relies on a separation of concerns into what task is being solved and how the task should be solved. This framework is implemented by differentiating task specific parameters from task agnostic parameters, where the latter are optimized in a continual meta learning fashion, without access to multiple tasks at the same time. We showcase this framework in a supervised learning scenario and discuss the implication of the proposed formalism.


Warping Resilient Time Series Embeddings

Time series are ubiquitous in real world problems and computing distance between two time series is often required in several learning tasks. Computing similarity between time series by ignoring variations in speed or warping is often encountered and dynamic time warping (DTW) is the state of the art. However DTW is not applicable in algorithms which require kernel or vectors. In this paper, we propose a mechanism named WaRTEm to generate vector embeddings of time series such that distance measures in the embedding space exhibit resilience to warping. Therefore, WaRTEm is more widely applicable than DTW. WaRTEm is based on a twin auto-encoder architecture and a training strategy involving warping operators for generating warping resilient embeddings for time series datasets. We evaluate the performance of WaRTEm and observed more than 20\% improvement over DTW in multiple real-world datasets.


Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension

Multi-hop reading comprehension requires the model to explore and connect relevant information from multiple sentences/documents in order to answer the question about the context. To achieve this, we propose an interpretable 3-module system called Explore-Propose-Assemble reader (EPAr). First, the Document Explorer iteratively selects relevant documents and represents divergent reasoning chains in a tree structure so as to allow assimilating information from all chains. The Answer Proposer then proposes an answer from every root-to-leaf path in the reasoning tree. Finally, the Evidence Assembler extracts a key sentence containing the proposed answer from every path and combines them to predict the final answer. Intuitively, EPAr approximates the coarse-to-fine-grained comprehension behavior of human readers when facing multiple long documents. We jointly optimize our 3 modules by minimizing the sum of losses from each stage conditioned on the previous stage’s output. On two multi-hop reading comprehension datasets WikiHop and MedHop, our EPAr model achieves significant improvements over the baseline and competitive results compared to the state-of-the-art model. We also present multiple reasoning-chain-recovery tests and ablation studies to demonstrate our system’s ability to perform interpretable and accurate reasoning.


Functional Singular Spectrum Analysis

In this paper, we introduce a new extension of the Singular Spectrum Analysis (SSA) called functional SSA to analyze functional time series. The new methodology is developed by integrating ideas from functional data analysis and univariate SSA. We explore the advantages of the functional SSA in terms of simulation results and with an application to a call center data. We compare the proposed approach with Multivariate SSA (MSSA) and Functional Principal Component Analysis (FPCA). The results suggest that further improvement to MSSA is possible and the new method provides an attractive alternative to the novel extensions of the FPCA for correlated functions. We have also developed an efficient and user-friendly R package and a shiny web application to allow interactive exploration of the results.


Better Code, Better Sharing:On the Need of Analyzing Jupyter Notebooks

By bringing together code, text, and examples, Jupyter notebooks have become one of the most popular means to produce scientific results in a productive and reproducible way. As many of the notebook authors are experts in their scientific fields, but laymen with respect to software engineering, one may ask questions on the quality of notebooks and their code. In a preliminary study, we experimentally demonstrate that Jupyter notebooks are inundated with poor quality code, e.g., not respecting recommended coding practices, or containing unused variables and deprecated functions. Considering the education nature of Jupyter notebooks, these poor coding practices as well as the lacks of quality control might be propagated into the next generation of developers. Hence, we argue that there is a strong need to programmatically analyze Jupyter notebooks, calling on our community to pay more attention to the reliability of Jupyter notebooks.


Representation Learning for Words and Entities

This thesis presents new methods for unsupervised learning of distributed representations of words and entities from text and knowledge bases. The first algorithm presented in the thesis is a multi-view algorithm for learning representations of words called Multiview Latent Semantic Analysis (MVLSA). By incorporating up to 46 different types of co-occurrence statistics for the same vocabulary of english words, I show that MVLSA outperforms other state-of-the-art word embedding models. Next, I focus on learning entity representations for search and recommendation and present the second method of this thesis, Neural Variational Set Expansion (NVSE). NVSE is also an unsupervised learning method, but it is based on the Variational Autoencoder framework. Evaluations with human annotators show that NVSE can facilitate better search and recommendation of information gathered from noisy, automatic annotation of unstructured natural language corpora. Finally, I move from unstructured data and focus on structured knowledge graphs. I present novel approaches for learning embeddings of vertices and edges in a knowledge graph that obey logical constraints.


GluonTS: Probabilistic Time Series Models in Python

We introduce Gluon Time Series (GluonTS)\footnote{\url{https://gluon-ts.mxnet.io}}, a library for deep-learning-based time series modeling. GluonTS simplifies the development of and experimentation with time series models for common tasks such as forecasting or anomaly detection. It provides all necessary components and tools that scientists need for quickly building new models, for efficiently running and analyzing experiments and for evaluating model accuracy.


Learning Curves for Deep Neural Networks: A Gaussian Field Theory Perspective

A series of recent works suggest that deep neural networks (DNNs), of fixed depth, are equivalent to certain Gaussian Processes (NNGP/NTK) in the highly over-parameterized regime (width or number-of-channels going to infinity). Other works suggest that this limit is relevant for real-world DNNs. These results invite further study into the generalization properties of Gaussian Processes of the NNGP and NTK type. Here we make several contributions along this line. First, we develop a formalism, based on field theory tools, for calculating learning curves perturbatively in one over the dataset size. For the case of NNGPs, this formalism naturally extends to finite width corrections. Second, in cases where one can diagonalize the covariance-function of the NNGP/NTK, we provide analytic expressions for the asymptotic learning curves of any given target function. These go beyond the standard equivalence kernel results. Last, we provide closed analytic expressions for the eigenvalues of NNGP/NTK kernels of depth 2 fully-connected ReLU networks. For datasets on the hypersphere, the eigenfunctions of such kernels, at any depth, are hyperspherical harmonics. A simple coherent picture emerges wherein fully-connected DNNs have a strong entropic bias towards functions which are low order polynomials of the input.


COMET: Commonsense Transformers for Automatic Knowledge Graph Construction

We present the first comprehensive study on automatic knowledge base construction for two prevalent commonsense knowledge graphs: ATOMIC (Sap et al., 2019) and ConceptNet (Speer et al., 2017). Contrary to many conventional KBs that store knowledge with canonical templates, commonsense KBs only store loosely structured open-text descriptions of knowledge. We posit that an important step toward automatic commonsense completion is the development of generative models of commonsense knowledge, and propose COMmonsEnse Transformers (COMET) that learn to generate rich and diverse commonsense descriptions in natural language. Despite the challenges of commonsense modeling, our investigation reveals promising results when implicit knowledge from deep pre-trained language models is transferred to generate explicit knowledge in commonsense knowledge graphs. Empirical results demonstrate that COMET is able to generate novel knowledge that humans rate as high quality, with up to 77.5% (ATOMIC) and 91.7% (ConceptNet) precision at top 1, which approaches human performance for these resources. Our findings suggest that using generative commonsense models for automatic commonsense KB completion could soon be a plausible alternative to extractive methods.


Pairwise Fairness for Ranking and Regression

We present pairwise metrics of fairness for ranking and regression models that form analogues of statistical fairness notions such as equal opportunity or equal accuracy, as well as statistical parity. Our pairwise formulation supports both discrete protected groups, and continuous protected attributes. We show that the resulting training problems can be efficiently and effectively solved using constrained optimization and robust optimization techniques based on two player game algorithms developed for fair classification. Experiments illustrate the broad applicability and trade-offs of these methods.


Tensor Canonical Correlation Analysis

In many applications, such as classification of images or videos, it is of interest to develop a framework for tensor data instead of ad-hoc way of transforming data to vectors due to the computational and under-sampling issues. In this paper, we study canonical correlation analysis by extending the framework of two dimensional analysis (Lee and Choi, 2007) to tensor-valued data. Instead of adopting the iterative algorithm provided in Lee and Choi (2007), we propose an efficient algorithm, called the higher-order power method, which is commonly used in tensor decomposition and more efficient for large-scale setting. Moreover, we carefully examine theoretical properties of our algorithm and establish a local convergence property via the theory of Lojasiewicz’s inequalities. Our results fill a missing, but crucial, part in the literature on tensor data. For practical applications, we further develop (a) an inexact updating scheme which allows us to use the state-of-the-art stochastic gradient descent algorithm, (b) an effective initialization scheme which alleviates the problem of local optimum in non-convex optimization, and (c) an extension for extracting several canonical components. Empirical analyses on challenging data including gene expression, air pollution indexes in Taiwan, and electricity demand in Australia, show the effectiveness and efficiency of the proposed methodology.


Dynamic Time Scan Forecasting

The dynamic time scan forecasting method relies on the premise that the most important pattern in a time series precedes the forecasting window, i.e., the last observed values. Thus, a scan procedure is applied to identify similar patterns, or best matches, throughout the time series. As oppose to euclidean distance, or any distance function, a similarity function is dynamically estimated in order to match previous values to the last observed values. Goodness-of-fit statistics are used to find the best matches. Using the respective similarity functions, the observed values proceeding the best matches are used to create a forecasting pattern, as well as forecasting intervals. Remarkably, the proposed method outperformed statistical and machine learning approaches in a real case wind speed forecasting problem.


Linear Distillation Learning

Deep Linear Networks do not have expressive power but they are mathematically tractable. In our work, we found an architecture in which they are expressive. This paper presents a Linear Distillation Learning (LDL) a simple remedy to improve the performance of linear networks through distillation. In deep learning models, distillation often allows the smaller/shallow network to mimic the larger models in a much more accurate way, while a network of the same size trained on the one-hot targets can’t achieve comparable results to the cumbersome model. In our method, we train students to distill teacher separately for each class in dataset. The most striking result to emerge from the data is that neural networks without activation functions can achieve high classification score on a small amount of data on MNIST and Omniglot datasets. Due to tractability, linear networks can be used to explain some phenomena observed experimentally in deep non-linear networks. The suggested approach could become a simple and practical instrument while further studies in the field of linear networks and distillation are yet to be undertaken.


Random Tessellation Forests

Space partitioning methods such as random forests and the Mondrian process are powerful machine learning methods for multi-dimensional and relational data, and are based on recursively cutting a domain. The flexibility of these methods is often limited by the requirement that the cuts be axis aligned. The Ostomachion process and the self-consistent binary space partitioning-tree process were recently introduced as generalizations of the Mondrian process for space partitioning with non-axis aligned cuts in the two dimensional plane. Motivated by the need for a multi-dimensional partitioning tree with non-axis aligned cuts, we propose the Random Tessellation Process (RTP), a framework that includes the Mondrian process and the binary space partitioning-tree process as special cases. We derive a sequential Monte Carlo algorithm for inference, and provide random forest methods. Our process is self-consistent and can relax axis-aligned constraints, allowing complex inter-dimensional dependence to be captured. We present a simulation study, and analyse gene expression data of brain tissue, showing improved accuracies over other methods.


A Brief Introduction to Manifold Optimization

Manifold optimization is ubiquitous in computational and applied mathematics, statistics, engineering, machine learning, physics, chemistry and etc. One of the main challenges usually is the non-convexity of the manifold constraints. By utilizing the geometry of manifold, a large class of constrained optimization problems can be viewed as unconstrained optimization problems on manifold. From this perspective, intrinsic structures, optimality conditions and numerical algorithms for manifold optimization are investigated. Some recent progress on the theoretical results of manifold optimization are also presented.


Factorized Mutual Information Maximization

We investigate the sets of joint probability distributions that maximize the average multi-information over a collection of margins. These functionals serve as proxies for maximizing the multi-information of a set of variables or the mutual information of two subsets of variables, at a lower computation and estimation complexity. We describe the maximizers and their relations to the maximizers of the multi-information and the mutual information.


Reinforcement Learning of Spatio-Temporal Point Processes

Spatio-temporal event data is ubiquitous in various applications, such as social media, crime events, and electronic health records. Spatio-temporal point processes offer a versatile framework for modeling such event data, as it can jointly capture spatial and temporal dependency. A key question is to estimate the generative model for such point processes, which enables the subsequent machine learning tasks. Existing works mainly focus on parametric models for the conditional intensity function, such as the widely used multi-dimensional Hawkes processes. However, parametric models tend to lack flexibility in tackling real data. On the other hand, non-parametric for spatio-temporal point processes tend to be less interpretable. We introduce a novel and flexible semi-parametric spatial-temporal point processes model, by combining spatial statistical models based on heterogeneous Gaussian mixture diffusion kernels, whose parameters are represented using neural networks. We learn the model using a reinforcement learning framework, where the reward function is defined via the maximum mean discrepancy (MMD) of the empirical processes generated by the model and the real data. Experiments based on real data show the superior performance of our method relative to the state-of-the-art.

Continue Reading…

Collapse

Read More

What’s new with MLflow? On-Demand Webinar and FAQs now available!

On June 6th, our team hosted a live webinar—Managing the Complete Machine Learning Lifecycle: What’s new with MLflow—with Clemens Mewald, Director of Product Management at Databricks.

Machine learning development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.

To solve for these challenges, last June, we unveiled MLflow, an open source platform to manage the complete machine learning lifecycle. Most recently, we announced the General Availability of Managed MLflow on Databricks and the MLflow 1.0 Release.

In this webinar, we reviewed new and existing MLflow capabilities that allow you to:

  • Keep track of experiments runs and results across frameworks.
  • Execute projects remotely on to a Databricks cluster, and quickly reproduce your runs.
  • Quickly productionize models using Databricks production jobs, Docker containers, Azure ML, or Amazon SageMaker

We demonstrated these concepts using notebooks and tutorials from our public documentation so that you can practice at your own pace. If you’d like free access Databricks Unified Analytics Platform and try our notebooks on it, you can access a free trial here.

Toward the end, we held a Q&A and below are the questions and answers.

Q: Apart from having the trouble of all the set-up, is there any missing features/disadvantages of using MLflow on-premises rather than in the cloud on Databricks?

Databricks is very committed to the open source community. Our founders are the original creators of Apache SparkTM – a widely adopted open source unified analytics engine – and our company still actively maintains and contributes to the open source Spark code. Similarly, for both Delta Lake and MLflow, we’re equally committed to help the open source community benefit from these products, as well as provide an out-of-the-box managed version of these products.

When we think about features to provide on the open source or the managed version of Delta Lake or MLflow, we don’t think about whether we should hold back a feature on a version or another. We think about what additional features we can provide that only make sense in a hosted and managed version for enterprise users. Therefore, all the benefits you get from managed MLflow on Databricks are that you don’t need to worry about the setup, managing the servers, and all these integrations with the Databricks Unified Analytics Platform that makes it seamlessly work with the rest of the workflow. Visit http://databricks.com/mlflow to learn more.

Q: Does MLflow 1.0 supports Windows?

Yes, we added support to run the MLflow client on windows. Please see our release notes here.

Q: Is MLflow complements or competes with TensorFlow?

It’s a perfect complement. You can train TensorFlow models and log the metrics and models with MLflow.

Q: How many different metrics can we track using MLflow? Are there any restrictions imposed on it?

MLflow doesn’t impose any limits on the number of metrics you can track. The only limitations are in the backend that is used to store those metrics.

Q: How to parallelize models training with MLflow?

MLflow is agnostic to the ML framework you use to train the model. If you use TensorFlow or PyTorch you can distribute your training jobs with for example HorovodRunner and use MLflow to log your experiments, runs, and models.

Q: Is there a way to bulk extract the MLflow info to perform operational analytics (e.g. how many training runs were there in the last quarter. How many people are training models etc.)?

We are working on a way to more easily extract the MLflow tracking metadata into a format that you can do data science with, e.g. into a pandas dataframe.

Q: Is it possible to train and build a MLflow model using a platform (e.g. like Databricks using TensorFlow with PySpark) and then reuse that MLflow model in another platform (for example in R using RStudio) to score any input?

The MLflow Model format and abstraction allows using any MLflow model from anywhere you can load them. E.g., you can use the python function flavor to call the model from any Python library, or the r function flavor to call it as an R function. MLflow doesn’t rewrite the models into a new format, but you can always expose an MLflow model as a REST endpoint and then call it in a language agnostic way.

Q: To serve a model, what are the options to deploy outside of databricks, eg. Sagemaker. Do you have any plans to deploy as AWS Lambdas?

We provide several ways you can deploy MLflow models, including Amazon SageMaker, Microsoft Azure ML, Docker Containers, Spark UDF and more… See this page for a list. To give one example of how to use MLflow models with AWS Lambda, you can use the python function flavor which enables you to call the model from anywhere you can call a Python function.

Q: Can MLflow be used with python programs outside of Databricks?

Yes, MLflow is an open source product and can be found on GitHub and PyPi.

Q: What is the pricing model for Databricks?

Please see https://databricks.com/product/pricing

Q: Hi how do you see MLflow evolving in relation to Airflow?

We are looking into ways to support multi-step workflows. One way we could do this is by using Airflow. We haven’t made these decisions yet.

Q: Suggestions for deploying multi-step models for example ensemble of several base models.

Right now you can deploy those as MLflow models by writing code to ensemble other models. E.g. similar to how the multi-step workflow example is implemented.

Q: Does MLflow provide a framework to do feature engineering on data?

Not specifically, but you can use any other framework together with MLflow.

To get started with MLflow, follow the instructions at mlflow.org or check out the alpha release code on Github. We’ve also recently created a Slack channel for MLflow as well for real time questions, and you can follow @MLflowOrg on Twitter. We are excited to hear your feedback!

--

Try Databricks for free. Get started today.

The post What’s new with MLflow? On-Demand Webinar and FAQs now available! appeared first on Databricks.

Continue Reading…

Collapse

Read More

Top KDnuggets Tweets, Jun 19 – 25: Learn how to efficiently handle large amounts of data using #Pandas; The biggest mistake while learning #Python for #datascience

Also: Data Science Jobs Report 2019; Harvard CS109 #DataScience Course, Resources #Free and Online; Google launches TensorFlow; Mastering SQL for Data Science

Continue Reading…

Collapse

Read More

If you did not already know

zoNNscan google
The training of deep neural network classifiers results in decision boundaries which geometry is still not well understood. This is in direct relation with classification problems such as so called adversarial examples. We introduce zoNNscan, an index that is intended to inform on the boundary uncertainty (in terms of the presence of other classes) around one given input datapoint. It is based on confidence entropy, and is implemented through sampling in the multidimensional ball surrounding that input. We detail the zoNNscan index, give an algorithm for approximating it, and finally illustrate its benefits on four applications, including two important problems for the adoption of deep networks in critical systems: adversarial examples and corner case inputs. We highlight that zoNNscan exhibits significantly higher values than for standard inputs in those two problem classes. …

Compositional Coding for Collaborative Filtering google
Efficiency is crucial to the online recommender systems. Representing users and items as binary vectors for Collaborative Filtering (CF) can achieve fast user-item affinity computation in the Hamming space, in recent years, we have witnessed an emerging research effort in exploiting binary hashing techniques for CF methods. However, CF with binary codes naturally suffers from low accuracy due to limited representation capability in each bit, which impedes it from modeling complex structure of the data. In this work, we attempt to improve the efficiency without hurting the model performance by utilizing both the accuracy of real-valued vectors and the efficiency of binary codes to represent users/items. In particular, we propose the Compositional Coding for Collaborative Filtering (CCCF) framework, which not only gains better recommendation efficiency than the state-of-the-art binarized CF approaches but also achieves even higher accuracy than the real-valued CF method. Specifically, CCCF innovatively represents each user/item with a set of binary vectors, which are associated with a sparse real-value weight vector. Each value of the weight vector encodes the importance of the corresponding binary vector to the user/item. The continuous weight vectors greatly enhances the representation capability of binary codes, and its sparsity guarantees the processing speed. Furthermore, an integer weight approximation scheme is proposed to further accelerate the speed. Based on the CCCF framework, we design an efficient discrete optimization algorithm to learn its parameters. Extensive experiments on three real-world datasets show that our method outperforms the state-of-the-art binarized CF methods (even achieves better performance than the real-valued CF method) by a large margin in terms of both recommendation accuracy and efficiency. …

Coarse-to-Fine Network (C2F Net) google
Deep neural networks have seen tremendous success for different modalities of data including images, videos, and speech. This success has led to their deployment in mobile and embedded systems for real-time applications. However, making repeated inferences using deep networks on embedded systems poses significant challenges due to constrained resources (e.g., energy and computing power). To address these challenges, we develop a principled co-design approach. Building on prior work, we develop a formalism referred to as Coarse-to-Fine Networks (C2F Nets) that allow us to employ classifiers of varying complexity to make predictions. We propose a principled optimization algorithm to automatically configure C2F Nets for a specified trade-off between accuracy and energy consumption for inference. The key idea is to select a classifier on-the-fly whose complexity is proportional to the hardness of the input example: simple classifiers for easy inputs and complex classifiers for hard inputs. We perform comprehensive experimental evaluation using four different C2F Net architectures on multiple real-world image classification tasks. Our results show that optimized C2F Net can reduce the Energy Delay Product (EDP) by 27 to 60 percent with no loss in accuracy when compared to the baseline solution, where all predictions are made using the most complex classifier in C2F Net. …

Warping Resilient Time Series Embedding google
Time series are ubiquitous in real world problems and computing distance between two time series is often required in several learning tasks. Computing similarity between time series by ignoring variations in speed or warping is often encountered and dynamic time warping (DTW) is the state of the art. However DTW is not applicable in algorithms which require kernel or vectors. In this paper, we propose a mechanism named WaRTEm to generate vector embeddings of time series such that distance measures in the embedding space exhibit resilience to warping. Therefore, WaRTEm is more widely applicable than DTW. WaRTEm is based on a twin auto-encoder architecture and a training strategy involving warping operators for generating warping resilient embeddings for time series datasets. We evaluate the performance of WaRTEm and observed more than $20\%$ improvement over DTW in multiple real-world datasets. …

Continue Reading…

Collapse

Read More

Book Review: Good to Go, by Christine Aschwanden

This is a book review. It is by Phil Price. It is not by Andrew.

The book is Good To Go: What the athlete in all of us can learn from the strange science of recovery. By Christine Aschwanden, published by W.W. Norton and Company. The publisher offered a copy to Andrew to review, and Andrew offered it to me as this blog’s unofficial sports correspondent.

tldr: This book argues persuasively that when it comes to optimizing the recovery portion of the exercise-recover-exercise cycle, nobody knows nuthin’ and most people who claim to know sumthin’ are wrong. It’s easy to read and has some nice anecdotes. Worth reading if you have a special interest in the subject, otherwise not. Full review follows.

The book is about ‘recovery’. In the context of the book, recovery is what you do between bouts of exercise; or, if you prefer, exercise is what you do between periods of recovery. The book has great blurbs. “A tour de force of great science journalism”, writes Nate Silver (!). “…a definitive tour through a bewildering jungle of scientific and pseudoscientific claims…”, writes David Epstein. “…Aschwanden makes the mid-boggling world of sports recovery a hilarious adventure”, says Olympic gold medal skier Jessie Diggins. With blurbs like these I was expecting a lot…although once I realized Aschwanden works at FiveThirtyEight, I downweighted the Silver blurb appropriately. Even so, I expected too much: the book is fine but ultimately rather unsatisfying. It is fairly interesting and sometimes amusing, but there’s only so much any author can do with the subject given the current state of knowledge, which is this: other than getting enough sleep and eating enough calories, nobody knows for sure what helps athletes recover between events or training sessions better than just living a normal life. The book is mostly just 300 pages of elucidating and amplifying that disappointing state of knowledge.

The author, Aschwanden, went to a lot of trouble, conducting hundreds of interviews, reading hundreds of scientific or quasi-scientific or pseudo-scientific papers, and in some cases subjecting herself to treatments in the interest of journalism (a sensory deprivation tank! Tom Brady’s magic pajamas! A cryogenic chamber!…) If the subject of athletic recovery is especially interesting to you then hey, it’s a fine book, plenty of good stuff in there, $30 well spent for a two or three hours of information and amusement.

For readers of this blog — and maybe for everybody — the first couple of chapters are the best ones, because they provide some insights that can apply to many areas of science and statistical analysis. The first chapter explains what happened when Aschwanden became interested in whether beer is good, bad, or indifferent as a ‘recovery drink.’ She has a friend who was a researcher at a lab that researches human performance and when she brought the question to him he was enthusiastic about studying this issue, so they did. They designed and performed a study that is typical (all too typical) of studies that address this kind of issue: only 10 participants, with tests spanning a couple of days. Do some hard exercise, then drink regular beer or non-alcoholic beer. The next day “run to exhaustion” (following a standard protocol) and afterwards drink whichever beverage you didn’t drink the previous day. The next day, run to exhaustion again. Quantify the time to run to exhaustion at the specified level of effort. The study found no ‘statistically significant’ difference between real beer and fake beer for the contestants as a whole, or for male participants, but for women there was a statistically significant difference, with performance better after real beer! And for men there was a difference large enough to be substantively important if true, but not statistically significant. Fortunately, Aschwanden is no dummy. She doesn’t mention the ‘garden of forking paths’, but does recognize some other major methodological problems with the study. As she puts it: “There was only one problem: I didn’t believe it. Trust me — I wanted our study to show that beer was great for runners, really, I did. Yet my experience as a participant… left me feeling skeptical of our result, and the episode helped me understand and recognize some pitfalls that I’ve found to be common among sports performance studies.” And then she gives a few paragraphs that do a great job of illustrating why it is really hard to get objective measures of human performance for a study like this, and why it matters. The upshot is that in this study the researchers are fitting noise. And the problems that came up in this study are common, indeed nearly ubiquitous, in this sort of research. Disappointingly, even this chapter doesn’t show any data or any hard numbers. There’s not a plot or table in the book.

The second chapter discusses hydration (and over-hydration), starting off with a discussion of the creation and marketing of Gatorade and going on from there. As with every chapter, Aschwanden mixes anecdotes, history, and results from scientific studies, and pulls everything together with her own evaluation. It’s a good formula and makes for a readable book. The hydration chapter is typical in that it illustrates the extent to which marketing and a smattering of scientific research led to a widespread perception among athletes that later turned out either not to be true or to be more nuanced than was first thought. In fact, according to Aschwanden and backed up by many studies she cites, in contrast to what many athletes and coaches have believed over the past thirty years or so our bodies can tolerate moderate dehydration with very little problem, and optimal hydration for a many athletes and many activities turns out to involve a lot less drinking than most people (including most athletes and coaches) thought for decades. And it’s probably better to be rather dehydrated than to be rather over-hydrated.

I can’t resist adding my own little hydration story. A couple of years ago, on a very hot day I rode my bike on a hilly route to our local mountain (Mount Diablo), rode up it and back down, stopped at the bottom for food, and then rode back home. The ride was about 100 miles and the temperature was in the high nineties. Each time I stopped for water, I filled and chugged one of my water bottles, then filled both of them and continued on, draining both bottles by the time I got to the next water stop. Knowing the capacity of my bottles and the number of times I stopped, it’s easy to count how much I drunk that day. I also had a large milkshake and a coke at my lunch stop, as well as something like a pound of food. On that day I drank 17 pounds of fluid. I weighed myself when I got home and found that I had lost 8 pounds. I had not urinated during the day, and didn’t do so for several hours after I got home. What’s the point of telling you this? I dunno; I just think it’s really interesting. In one long day I sweated or exhaled more than 25 pounds of water! I still find it hard to believe..although it does jibe with one of Gatorade’s early marketing campaigns, which promoted the idea that athletes should drink 40 ounces per hour, and not necessarily on a brutally hot day. But Aschwanden has both anecdotes and studies in which successful athletes drank much less, and about some athletes getting in bad medical trouble by drinking too much.  The point isn’t that endurance athletes shouldn’t drink, it’s that they shouldn’t obsess about drinking as long as they don’t get too thirsty. Aschwanden says it has long been conventional wisdom that in an athletic event you should drink before you’re thirsty, and drink enough that you never become thirsty, but there’s actually no evidence that that leads to better performance than simply drinking when you feel like it.

Another chapter covers the current fad for ice baths, cryogenic chambers, ice-water compression boots, and so on. No real evidence they help, no real evidence they hurt.

Another chapter covers the current fad for infrared treatments (heat baths, saunas, ‘infrared’ saunas, Tom Brady’s magic thermal underwear, etc.) No real evidence they help, no real evidence they hurt. Oh, and not only have the claims about thermal underwear not been evaluated by the Food and Drug Administration, they’ve apparently never been evaluated by a physicist either, because they’re ridiculous. If you buy the underwear you deserve to be mocked, and you should be. If no one else will do it for you, send me an email and I’ll do the mocking.

Massage? No real evidence it helps, no real evidence it hurts. That said, I intend to continue to get occasional massages from my next door neighbor, Cyrus Poitier, who is an elite sports masseur. He travels with the men’s national wrestling team and the women’s swim team, and is one of the US Olympic Team’s masseurs. Like most of Cyrus’s clients, I don’t go to Cyrus for feel-good massages — in fact they are usually quite painful — but instead I go when I have some soreness or tightness that I haven’t been able to get rid of on my own, and I do think his massages help. But do they really, in the sense of helping me perform better athletically, and, if so, how much? According to Aschwanden there’s no evidence, or only weak evidence, that they help at all. But I would swear they help me! And he has many elite athletes as clients. So are all of us wrong? Well, maybe we are, or maybe we’re right that the massages help but the effect is rather small. Or maybe they help the performance of those of us with some musculo-skeletal issues but harm the performance of people with other issues. The right way to answer this is with data, and according to Aschwanden the existing data aren’t adequate to the task.

Every ‘recovery modality’ in the book has a bunch of proponents, including some elite athletes who swear by it. Every one of the modalities has a bunch of individuals or companies promoting it and telling people it works, usually buttressed by questionable studies like Aschwanden’s beer study.  And just about every one of the recovery methods or substances has some skeptics who think it’s all hype.

And ultimately that’s the problem with Aschwanden’s book, though it’s not her fault: at the moment it’s impossible to know what works, and how well. She says this herself, towards the end of the book: “After exploring a seemingly endless array of recovery aids, I’ve come to think of them as existing on a sort of evidence continuum. At one end you’ve got sleep — the most potent recovery tool ever recovered (and one that money can’t buy). At the other end lies a pile of faddish products like hydrogen water and oxygen inhalers, which an ounce of common sense can tell you are mostly useless… Most things, however, lie somewhere in the vast middle — promising but unproven.” For someone like me, that’s a good reason to ignore just about all of the unproven stuff: even if something would improve my performance fairly substantially — let’s say a 5% increase in speed on my hardest bike rides — that wouldn’t change my life in a noticeable way. But for a competitive athlete, even 0.5% could be the difference between a gold medal and being off the podium, or being a pro vs an amateur who never quite breaks through. So there are always going to be people promoting this stuff, and there will always be athletes willing to give it a try.

Although firm conclusions about effectiveness are hard to come by, there’s plenty of interesting stuff in the book. For example, one of the many anecdotes concerns sprinter Usain Bolt. At the 2008 Olympics in Beijing, Bolt wasn’t happy with any of the unfamiliar food available to him at the athletes’ cafeteria, so he went to McDonalds and ate Chicken McNuggets. Every day. For lunch and dinner. (He also ate a small amount of greens drenched in salad dressing). According to Bolt’s memoir, he ate about 100 nuggets every 24 hours, adding up to about 1000 chicken nuggets over the course of the ten days he competed in the 100m, 200m, and 4x100m relay (with multiple heats in each, plus the finals). He won gold medals in all of them. As Aschwanden says, “Those chicken nuggets were adequate, if not ideal, fuel to power him through his nine heats, and to help him recover his energy in between them. Feeling satiated and not worrying about gastrointestinal issues are surely worth a lot to an athlete preparing for his most important events of the season. Would Bolt have performed better eating some other recovery foods? Maybe. The better question is: How much difference would it make?”

By the way, a popular saying among the kind of people who read this blog is “The plural of ‘anecdote’ is not ‘data.'” I liked that saying too, the first time I heard it, but the more I think about it the less I agree with it. Of course it’s literally true that ‘data’ is not the plural of ‘anecdote’, since the plural of ‘anecdote’ is ‘anecdotes.’ But each (true) anecdote does provide a data point of sorts. A sprinter won three gold medals on a diet consisting almost entirely of Chicken McNuggets, and his 100m time was a world record even though he didn’t run all the way through the tape. That really does set an upper limit on how deleterious a week of Chicken McNugget consumption is, at least to Usain Bolt. As far as data go, that anecdote is probably more informative than the quantitative results of Aschwanden’s 10-participant beer study, no matter how carefully the study was conducted.

One of the good things about Aschwanden’s book is that she puts the pieces together for us. She’s smart, she’s a former elite athlete herself (a professional cross-country skier), she talked to hundreds of people, she read lots of scientific studies, and she formed well-informed beliefs about everything she writes about. Only a tiny portion of those interviews and studies can fit in the book, but I trust her judgment well enough to think I’d probably reach most of the same conclusions she did, so I appreciate the fact that she does summarize her beliefs. A few key ones are: (1) ‘recovery’ involves both mind and body, and stress of all kinds — physical, mental, and emotional — hurts recovery of both mind and body. (2) Sleep is especially important to recovery; relaxation is too. If an athlete’s recovery routine is itself a source of stress, it’s counterproductive. (3) Under-eating is bad, and is worse than eating non-optimally. (4) The timing of food intake is unimportant unless you have a short break between events. If you finish an event and you have another one in a few hours, eating the right thing at the right time is critical. But if you aren’t competing again for 24 hours or more, there is no ‘nutrition window’, there’s a nutrition ‘barn door’, in the words of one researcher she quotes. (5) Other than getting enough sleep and enough relaxation, and eating enough to replenish glycogen supplies and calories in time for your next event, nearly nothing else is definitively known to be beneficial compared to just living an ordinary life between events. (6) Overtraining is real thing, with both physical and mental components, and overtraining can be worse than undertraining. (7) With regard to specific ‘recovery modalities’: Massage might or might not help; ice baths might or might not help (and in fact might harm recovery a little); various food supplements might or might not help; heat in various forms might or might not help; ibuprofen and other anti-inflammatories probably do a little physical harm in most people, but most athletes refuse to believe it; stretching probably doesn’t help most people. (8) Different things work differently for different people, so following the same recovery routine as your sports idol might not work for you; (9) Some recovery methods, maybe a lot of them, really do help some people simply due to the ‘placebo effect’, and there’s nothing wrong with that: if it helps, it helps.

If any of these points seem odd or wrong or questionable to you, then I suggest reading the book, because Ashwanden explains why she has adopted her viewpoint. If you agree with all of them but want support for them, that’s another reason to read the book. If you agree with them all, shrug, and say “yeah, that’s pretty much what I figured” then you can skip the book unless you are interested in some interesting stories like the one about Bolt.

Continue Reading…

Collapse

Read More

Brickster Spotlight: Meet Alexandra

At Databricks, we build platforms to enable data teams to solve the world’s toughest problems and we couldn’t do that without our wonderful Databricks Team. “Teamwork makes the dreamwork” is not only a central function of our product but it is also central to our culture. Learn more about Alexandra Cong, one of our Software Engineers, and what drew her to Databricks!

Tell us a little bit about yourself.

I’m a software engineer on the Identity and Access Management team. I joined Databricks almost 3 years ago after graduating from Caltech, and I’ve been here ever since!

What were you looking for in your next opportunity, and why did you choose Databricks?

Coming out of college, I was looking for a smaller company where I could not only learn and grow, but make an impact. As a math major, I didn’t have all of the software engineering basics, but interviewing at Databricks reassured me that as long as I was willing and excited to learn from the unknown, I could be successful. Being able to help solve a wide scope of challenges sounded really exciting, as opposed to being at a more established company, where they may have already solved a lot of their big problems. Finally, every person I met during my interviews at Databricks was not only extremely smart, but more importantly, humble and nice – which made me really excited to join the team!

What gets you excited to come to work every day?

It’s really important to me to be always learning and developing new skills. At Databricks, each team owns their services end-to-end and covers such a wide breadth that this is always the case. It’s an additional bonus that any feature you work on is mission-critical and will have a big impact – we don’t have the bandwidth to work on anything that isn’t!

One of our core values at Databricks is to be an owner. What is your most memorable experience at Databricks when you owned it?

I’m part of our diversity committee because I’m passionate about creating an inclusive and welcoming environment for everyone here. We recently sponsored an organization at UC Berkeley that runs a hackathon for under-resourced high school students. Databricks provided mentorship, sponsored prizes, and I got to teach students how to use Databricks to do their data analysis. It was really rewarding to give back to the community, see high school students get excited about coding and data, and be able to encourage even just a handful of students to study Computer Science.

What has been the biggest challenge you’ve faced, and what is a lesson you learned from it?

The biggest challenge I’ve faced so far has been overcoming the mental hurdles growing into a senior software engineer role. Upon first understanding the expectations, I felt overwhelmed and the challenges seemed insurmountable, to the point where I became unmotivated and unhappy. Slowly I came to terms that I would have to take on uncomfortable tasks that would challenge me, and that I would inevitably make mistakes in the process. However, it was a necessary part of my growth and I would just have to tackle these challenges one at a time. This was difficult for me because I hate failing and would rather only do things when I know I will be successful. However, through this process, I’ve learned that I’ll grow so much more if I’m willing to make mistakes and learn from them.

Databricks has grown tremendously in the last few years. How do you see the future of Databricks evolving and what are you most excited to see us accomplish?

I see Databricks being used more and more by companies across many different domains. In an ideal world, Databricks will become the standard for doing data analysis. It might even be a qualification that data analysts list on their resumes! Of course, we have a lot of work to do if we want to get to that point, but I think the market opportunity is huge and I hope that we’ll be able to execute well enough to see that become a reality.

What advice would you give to women in tech who are starting their careers?

Advocate for yourself. This comes in various forms – negotiations, promotions, mentorship, leading projects, or even just talking with your manager about furthering your career growth. At times, I fell into the trap of assuming that my work would speak for itself, and that I didn’t need to do anything on top of that. I’ve since learned that even if it feels outside my comfort zone, I need to actively ask for more if and when I think I deserve it, because no one will be a better advocate for me than myself.

Want to work with Alexandra? Check out our Careers Page.

--

Try Databricks for free. Get started today.

The post Brickster Spotlight: Meet Alexandra appeared first on Databricks.

Continue Reading…

Collapse

Read More

Octoparse: A Revolutionary Web Scraping Software

Octoparse is the ultimate tool for data extraction (web crawling, data crawling and data scraping), which lets you turn the whole internet into a structured format. The newly launched Web Scraping Template makes it very easy even for people with no technical training.

Continue Reading…

Collapse

Read More

Announcing Trial and Domino 3.5: Control Center for Data Science Leaders

Even the most sophisticated data science organizations struggle to keep track of their data science projects. Data science leaders want to know, at any given moment, not just how many data science projects are in flight but what the latest updates and roadblocks are when it comes to model development and what projects need their immediate attention. 

But while there are a legion of tools for individual data scientists, the needs of data science leaders have not been well-served. For example, a VP of Analytics at a wealth management company recently told us he had to walk around the office, pen and notepad in-hand, going from person to person, in order to get an actual count of projects in flight because their traditional task tracking tools didn’t quite align with the workflow used by data science teams. It turned out that the final count was way off from the initial estimate provided to the CEO.

Data science leaders face a common set of challenges around visibility and governance: 

  • They need help tracking projects
  • They need help tracking models in production
  • They need help building a culture of following best practices

Given the potential repercussions from inaccurate information (from mis-set expectations, funding mismatch to project delays)  it didn’t surprise us that data science leaders packed the room at the Rev 2 Data Science Leaders Summit in New York for a live demo of our new “Control Center” functionalities designed specially for them. 

P.s. If you missed Rev this year, session presentations and recordings can be found here

Last fall, we delivered the Domino Control Center aimed at IT stakeholders with visibility into compute usage and spend. Today we are announcing a significant expansion of the Control Center with new features for data science leaders in Domino 3.5. 

Domino 3.5 allows data science leaders to define their own data science project life cycle. A new addition to the Control Center, Projects Portfolio Dashboard, allows data science leaders to easily track and manage projects with a holistic understanding of the latest developments. It also surfaces projects that need immediate attention in real time by showing the projects that are blocked.

Project Portfolio Dashboard

A data science leader can start their day in the Project Portfolio Dashboard, which shows a summary of in-flight projects broken down by configurable life cycle stages with immediate status update of all projects.

Project Stage Configuration 

Every organization has their own data science life cycle that meets their business needs. In Domino 3.5, we enable data science leaders & managers to define their own project life cycle and implement within their teams.

Data scientists can update their project stages as they progress through the lifecycle which notifies their collaborators via email.

Projects owners and contributors can use the project stage menu to flag a project as blocked with a description of the blocker. Once resolved, the project can be unblocked. On the flip side, when data scientists mark a project as complete with a description of the project conclusion, Domino also captures this metadata for project tracking and future references. All of this metadata captured can be useful for organizational learning, organize projects and help to avoid similar issues in the future. 

All of this information powers Domino’s new Projects Portfolio Dashboard. Data science leads can click through to gain more context on any of the inflight projects and discover blocked projects that need attention.

In the hypothetical project below, our Chief Data Scientist Josh sees that one of the blocked projects is Avinash and Niole’s Customer Churn project. Although he doesn’t recall the details of this project, he can see that it is in the R&D phase and has a hard stop in a few weeks. Diving into the project, he can see that the remaining goal is to get a classification model with AUC above 0.8. 

Josh can turn to the Activity Feed to get details on the blocker, the causes and suggest a course of action. In this example, he will ask the Customer Churn team to try a deep neural net. He can tag Sushmitha, a deep learning expert working on another team, and ask her to mentor this effort.

Managing projects, tracking production assets, and monitoring organizational health require new tools. These unique features were custom-built for data science leaders. At Domino, we are excited to see these benefits come to you as you use them with your teams.

All of this has been just some of what’s new in Domino, we also have a few other enhancements to our existing features in the 3.5 release. For example, Activity feed has been enhanced to show a preview of the files that are being commented on. It also shows the project stage updates and if any blockers have been raised by collaborators. Users can also filter by the type of activities. This combined with email notifications will ensure situational awareness of the projects at all times. 

Datasets

Domino 3.5 offers the options for users to create large Dataset Snapshots directly from data sitting on their computers. The upload limits on the CLI have been increased to 50 GB and up to 50,000 files. With the same upload limits, users can also upload files directly through the browser. The CLI and browser uploads offer a seamless way to migrate and contribute data on your laptop into a single place for data science work. Teams can leverage shared, curated data and eliminate potentially redundant data wrangling work and ensure fully reproducible experiments.

License Usage Reporting 

To complement the new features of Control Center for data science leaders, we are also launching user activity analysis enhancements which facilitate license reporting and compliance. It offers a detailed view of the level of Domino activity for each team member so that data science and IT leaders can manage their allocation of Domino licenses and have visibility and predictability for their costs. Domino administrators can quickly identify active and inactive users and decide whether they should be allocated a licence. The ability to track user activity and growth during budget planning and contract renewal, makes it much easier to plan for future spending. 

Trial 

In addition to the exciting breakthrough new features for data science leaders, we are also launching a new Trial Environment to make Domino more accessible. It’s perfect for those who want to try it out and evaluate if it would be useful to your work. The new features in this latest release will be in our trial environment too! This is a quick and easy way to get access to Domino and start experiencing the secret sauce companies like Dealer Tire and Redhat leverage in their data science organization. 

A more detailed release notes of Domino 3.5 can be found here. Domino 3.5 is currently generally available – be sure to check out our 3.5 release webinar or try Domino to see the latest platform capabilities. 

Continue Reading…

Collapse

Read More

Optimization with Python: How to make the most amount of money with the least amount of risk?

Learn how to apply Python data science libraries to develop a simple optimization problem based on a Nobel-prize winning economic theory for maximizing investment profits while minimizing risk.

Continue Reading…

Collapse

Read More

Monash University: Research Fellow or Sr Research Fellow in Information Technology [Suzhou, China]

Seeking an individual passionate about undertaking research in an area of Information Technology (IT), being one of Software Engineering and Cybersecurity, Machine Learning and Artificial Intelligence, or Human-Centred Computing, as well as multidisciplinary research through the application of IT to problems, and to be responsible for conducting research in this area of IT.

Continue Reading…

Collapse

Read More

Wikipedia views and every line of Billy Joel’s “We Didn’t Start the Fire”

In the biggest crossover event of the century, Tom Lum used the Wikipedia API to chart the number of views for every reference in Billy Joel’s We Didn’t Start the Fire. Yes. [via @waxpancake]

Tags: , ,

Continue Reading…

Collapse

Read More

Q&A with 2019 Innovator Under 35 Noam Brown

The MIT Technology Review has named Facebook AI Research Scientist Noam Brown one of this year’s Innovators Under 35 for his research on AI and games. The award recognizes talented young innovators by region for their potential to transform the world through their contributions to science and technology. Previous winners have included Mark Zuckerberg in 2007 and Google cofounders Larry Page and Sergey Brin in 2002.

Noam Brown is best known for his work on the poker-playing AI system Libratus, which he developed at Carnegie Mellon University with his PhD adviser Tuomas Sandholm in 2017. Libratus was the first AI to defeat top poker players in two-player no-limit Texas Hold’em.

Brown took a moment to share thoughts about his past, current, and future research on game-playing bots, as well as the possible applications of his research. He also shared advice to anyone looking to pursue research in AI and game theory.

Q: Describe some key highlights from your research work. What projects are you proudest of?

Noam Brown: The project I am proudest of is definitely Libratus, which beat top human poker professionals in two-player no-limit poker. This was a challenging problem that had existed for decades. Poker involves hidden information, which makes it resistant to other AI techniques that were successful in games like chess and Go. The first four years of my PhD were focused on figuring out how to crack this problem by building off of decades of research from previous researchers in the field. Eventually it became clear to me what the path to victory was, but it still took another year of implementation to get it all working and to actually beat top humans in the game. It was very rewarding to see all the pieces come together into an actual agent that could beat top humans.

Q: What led you to focus on AI and game theory?

NB: My original goal was to get a PhD in economics, but after spending a couple of years working in the economics research field, I realized I also wanted to build things. Economics doesn’t provide that opportunity as much as computer science, and especially AI, does. I had always been interested in both AI and game theory, and the economics angle of game theory seemed like a natural fit for me given my background.

Q: What does research in this area look like five years from now? What problems still need to be solved?

NB: There has been tremendous progress in recent years on AI for purely adversarial zero-sum games like checkers, chess, Go, poker, Starcraft, and Dota 2. But the real world isn’t zero-sum. Researchers still don’t know how to tackle AI for partly cooperative and partly adversarial settings, like negotiations. The state of the art in this area is way behind human performance. I think this will be a major area of AI research in the next five years, and it is an area that can have tremendous real-world impact.

Q: How does research on game-playing bots tie to real-world applications? How does it tie to other fields of AI research?

NB: While Libratus plays poker, the techniques are not limited to poker. Poker is just a benchmark that allows us to compare the performance of these techniques with the peak of human ability. That’s true for other AI milestones in games as well. The research I’m doing is really about developing AI techniques that can handle strategic reasoning and hidden information in multi-agent settings. This is very important because most real-world strategic interactions involve some amount of hidden information. If an AI agent is to act and to help people in the real world, it must be able to cope with hidden information.

Q: What surprised you most about how the research in this field has evolved? What’s been harder or easier than you might have expected?

NB: As with most research, it was very hard to predict what the “magic ingredient” would be that would lead to superhuman performance. My early PhD research was focused on techniques that seemed like good ideas at the time but ultimately didn’t make a huge difference in performance. But if you keep making good shots, eventually you’ll score a goal.

Q: What would you say to other AI researchers or students who are considering focusing on game theory and AI?

NB: This is a very exciting time to be in this research area. Research on imperfect-information games was historically a bit outside of the mainstream of AI, but recent results have shown convincingly that it holds answers to questions that have vexed AI researchers for decades. This is an underexplored field with a lot left to be done. But most important, I think the key to doing good research is loving what you do.

The post Q&A with 2019 Innovator Under 35 Noam Brown appeared first on Facebook Research.

Continue Reading…

Collapse

Read More

Distributed Learning with Random Features

** Nuit Blanche is now on Twitter: @NuitBlog **




Distributed learning and random projections are the most common techniques in large scale nonparametric statistical learning. In this paper, we study the generalization properties of kernel ridge regression using both distributed methods and random features. Theoretical analysis shows the combination remarkably reduces computational cost while preserving the optimal generalization accuracy under standard assumptions. In a benign case, O(N)partitions and O(N) random features are sufficient to achieve O(1/N) learning rate, where N is the labeled sample size. Further, we derive more refined results by using additional unlabeled data to enlarge the number of partitions and by generating features in a data-dependent way to reduce the number of random features.



Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv

Continue Reading…

Collapse

Read More

data.table is Much Better Than You Have Been Told

There is interest in converting relational query languages (that work both over SQL databases and on local data) into data.table commands, to take advantage of data.table‘s superior performance. Obviously if one wants to use data.table it is best to learn data.table. But if we want code that can run multiple places a translation layer may be in order.

In this note we look at how this translation is commonly done.

The dtplyr developers recently announced they are making changes to dtplyr to support two operation modes:

Note that there are two ways to use dtplyr:

  • Eagerly [WIP]. When you use a dplyr verb directly on a data.table object, it
    eagerly converts the dplyr code to data.table code, runs it, and returns a
    new data.table. This is not very efficient because it can’t take advantage
    of many of data.table’s best features.
  • Lazily. In this form, trigged by using lazy_dt(), no computation is
    performed until you explicitly request it with as.data.table(),
    as.data.frame() or as_tibble(). This allows dtplyr to inspect the
    full sequence of operations to figure out the best translation.

(reference, and recently completely deleted)

This is a bit confusing, but we can unroll it a bit.

  • The first “eager” method is how dplyr (and later dtplyr) has always converted dplyr pipelines into data.table realizations.
    It is odd to mark this as “WIP” (work in progress?), as this has been dplyr‘s strategy since the first released version of dplyr (verson 0.1.1 2014-01-29).
  • The second “lazy” method is the proper way to call data.table. Our own rqdatatable package has been calling data.table this way for over a year (ref). It is very odd that dplyr didn’t use this good strategy for the data.table adaptor, as it is the strategy dplyr uses in its SQL adaptor.

Let’s take a look at the current published version of dtplyr (0.0.3) and how its eager evaluation works. Consider the following 4 trivial functions: that each add one to a data.frame column multiple times.

base_r_fn <- function(df) {
  dt <- df
  for(i in seq_len(nstep)) {
    dt$x1 <- dt$x1 + 1
  }
  dt
}

dplyr_fn <- function(df) {
  dt <- df
  for(i in seq_len(nstep)) {
    dt <- mutate(dt, x1 = x1 + 1)
  }
  dt
}

dtplyr_fn <- function(df) {
  dt <- as.data.table(df)
  for(i in seq_len(nstep)) {
    dt <- mutate(dt, x1 = x1 + 1)
  }
  dt
}

data.table_fn <- function(df) {
  dt <- as.data.table(df)
  for(i in seq_len(nstep)) {
    dt[, x1 := x1 + 1]
  }
  dt[]
}

base_r_fn() is idiomatic R code, dplyr_fn() is idiomatic dplyr code, dtplyr_fn() is a idiomatic dplyr code operating over a data.table object (hence using dtplyr), and data.table_fn() is idiomatic data.table code.

When we time running all of these functions operating on a 100000 row by 100 column data frame for 1000 steps we see each of them takes the following time to complete the task on average:

        method mean_seconds
 1:     base_r    0.8367011
 2: data.table    1.5592681
 3:      dplyr    2.6420171
 4:     dtplyr  151.0217646

The “eager” dtplyr system is about 100 times slower than data.table. This trivial task is one of the few times that data.table isn’t by far the fastest implementation (in tasks involving grouped summaries, joins, and other non-trivial operations data.table typically has a large performance advantage, ref).

Here is the same data presented graphically.

Present 2

This is why we don’t consider “eager” the proper way to call data.table, it artificially makes data.table appear slow. This is the negative impression of data.table that the dplyr/dtplyr adaptors have been falsely giving dplyr users for the last five years. dplyr users either felt they were getting the performance of data.table through dplyr (if they didn’t check timings), or got a (false) negative impression of data.table (if they did check timings).

Details of the timings can be found here.

As we have said: the “don’t force so many extra copies” methodology has been in rqdatable for quite some time, and in fact works well. Some timings on a similar problem are shared here.

Present 2

Notice the two rqdatatable timings have some translation overhead. This is why using data.table directly is, in general, going to be a superior methodology.

Continue Reading…

Collapse

Read More

The Psychology of Flame Wars

(This article was first published on That’s so Random, and kindly contributed to R-bloggers)

I have been meaning to write this for a while, but with the dplyr vs data.table feud rising to new levels on Twitter the last couple of days, it all of a sudden seems more relevant. For those who don’t know what I am talking about, there are different ways of doing data science. There are the two major languages R and python, with their own implementations for analysing data. Then within R there are the different flavours of using the base language or applying the functions of the tidyverse. Within the tidyverse there is the dplyr package to do data wrangling, of which the functionality of the data.table package greatly overlaps. Each of the choices are wildly contested by users of both of the options. Oftentimes, these debates are first presented as objective comparisons between two options in which the favoured option clearly stands out. This then evokes fierce responses from the other camp and before you know it we are down to the point where we call each other’s choices not elegant or even ugly.

I hope to convince you that these debates are useless by looking into some of the underlying psychological principles that makes us vulnerable for this type of quarreling.

Cognitive Ease

The main reason I think the discussion of the merits of different implementations is fruitless, is that the two camps can never fully understand each other. Sure, objective comparisons can be made in computation speed and to some extent functionality. We can read from the authors and maintainers what the motivation is for implementing something in a certain way. But the true prove of the pudding remains in the eating, that is enabling the user in effectively putting it to use. Before you can effectively and joyously apply a complex systems (which all these implementations are), you need to spend countless hours sweating and swearing, reading documents and googling error messages. Day after day, hour after hour, you will fight yourself into mastery of one of the systems. To be most effective and consistent almost everybody will have a go-to system, in which day-to-day task will be done. You don’t roll a dice in the mourning to decide if this is going to be an R or a python day. You don’t switch from data.table to dplyr in the middle of an analysis, without a very good reason. You stick with what you know, because it will give you the answer you are after the quickest. There is a major path-dependency here. You initially start with one of the systems and with each time you use it your understanding and appreciation grows. Because of this you will keep using the system and so your love affair begins. Before you know it, you have a cognitive lock-in from your weapon-of-choice.

A big part of the appreciation for the system comes from your understanding of it. This understanding reliefs the large cognitive strain of doing data science a little bit. In the phenomenal Thinking Fast and Slow by Daniel Kahneman a full chapter is devoted to the topic of cognitive ease. It is shown that you get a good feeling out of things that are relatively easy to you. There is a very good evolutionary explanation for this; your brain consumes massive amounts of energy so parsimonious use of it is rewarded by feeling great. It is you body telling you, keep doing this not so hard thing, you are going to last a long time this way. Because the time you spent developing skills in your favourite system, looking at code from this system will give you a lot more cognitive ease than looking at code from a system you are far less familiar with. To understand code from the unfamiliar system at the same level as the familiar one, will require spending a lot of cognitive resources. This will be accompanied by negative emotions such as frustration, feeling tired, and even anger. It is then a very understandable but also a very silly mistake that these emotions are an indication that the unfamiliar system is poorly implemented or even ugly. However, it is ignorance, not the software being bad that caused these emotions.

Social Identity

From cognitive psychology we turn to social psychology. In the seventies Henri Tajfel developed the theory of social identity. Part of the way we view ourselves is determined by the groups we are part of. If a group we are part of does do well by some measure, we will personally start to feel better about ourselves even when we did not had any part in the achievement. Just look at the crowds that celebrate at a victory parade of the local sports teams. They did not spent a minute on the field, they might not even been at the stadium supporting the players, and still they experience the victory as theirs as well. Using a software system a good part of your waking hours, day after day, will inevitably lead to that system being part of your social identity. Gradually you are not just using the software, you are becoming the software as well.

On its own this is not a bad thing, as humans we need this sense of belonging. However, it will also change your behaviour, and oftentimea not for the good. In order to boost your self-esteem you want your “team” to be on the winning end. Each year when Stackoverflow shares the rises and falls of use of software languages, users of both R and python jubilantly share the results (both are growing year after year). While this is an objective development in which you had no part other than being one of the users, discussions about the merits of software systems have active user involvement. Probably the easiest way to make your team look better is to make the other team look worse. Just think about the tiring debates between fans of different sports teams to discuss which is the best. Or the endless mud throwing at political races. Flame wars are no difference, by mocking the other system we celebrate our own and we are part of the winning team.

Now, here is the good news. In sports and politics, the different parties are also objectively in a competition. They play a zero-sum game, in which one’s victory must mean the other’s defeat. We as a data science community are not in a zero-sum game and we often seem to forget it. Even when the ‘competing’ system is on the rise you can do your job effectively and with joy in the on you prefer. Instead of mocking each other we should be thankful for the wealth of options we have to do our jobs. When our primary system does not offer the functionality we are looking for, we might find it in the other. The different systems also can influence each other positively, functionality in other systems might inspire authors to make theirs more complete.

Conclusion

I have looked into two psychological mechanisms that I think do stir-up flame wars. I hope it will make you think again before posting a comment on Twitter or starting a heated discussion with a colleague that leads nowhere. We have several options to do our daily jobs and each of them has proven itself in practice. Each is used by at least tens of thousands of analysts and programmers, who use them to bring real value to real organisations. Mocking one of them is not only harmful, it is disrespectful. Only the brightest and most determined of our peers could create systems that are so complex, complete and fault free. They have committed thousands of hours, often unpaid, to serve the community because they care. Mocking their labour because you don’t properly understand the system they designed or because you want to feel better about yourself is ignorant, and you should refrain from it.

To leave a comment for the author, please follow the link and comment on their blog: That’s so Random.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Continue Reading…

Collapse

Read More

Monash University: (Senior) Research Fellow in AI [Suzhou, China]

Seeking an individual passionate about undertaking research in an area of Artificial Intelligence (AI), as well as multidisciplinary research through the application of AI to problems, and to be responsible for conducting research in areas of AI.

Continue Reading…

Collapse

Read More

Scaling Genomic Workflows with Spark SQL BGEN and VCF Readers

In the past decade, the amount of available genomic data has exploded as the price of genome sequencing has dropped. Researchers are now able to scan for associations between genetic variation and diseases across cohorts of hundreds of thousands of individuals from projects such as the UK Biobank. These analyses will lead to a deeper understanding of the root causes of disease that will lead to treatments for some of today’s most important health problems. However, the tools to analyze these data sets have not kept pace with the growth in data.

Many users are accustomed to using command line tools like plink or single-node Python and R scripts to work with genomic data. However, single node tools will not suffice at terabyte scale and beyond. The Hail project from the Broad Institute builds on top of Spark to distribute computation to multiple nodes, but it requires users to learn a new API in addition to Spark and encourages that data to be stored in a Hail-specific file format. Since genomic data holds value not in isolation but as one input to analyses that combine disparate sources such as medical records, insurance claims, and medical images, a separate system can cause serious complications.

We believe that Spark SQL, which has become the de facto standard for working with massive datasets of all different flavors, represents the most direct path to simple, scalable genomic workflows. Spark SQL is used for extracting, transforming, and loading (ETL) big data in a distributed fashion. ETL is 90% of the effort involved in bioinformatics, from extracting mutations, annotating them with external data sources, to preparing them for downstream statistical and machine learning analysis. Spark SQL contains high-level APIs in languages such as Python or R that are simple to learn and result in code that is easier to read and maintain than more traditional bioinformatics approaches. In this post, we will introduce the readers and writers that provide a robust, flexible connection between genomic data and Spark SQL.

Reading data

Our readers are implemented as Spark SQL data sources, so VCF and BGEN can be read into a Spark DataFrame as simply as any other file type. In Python, reading a directory of VCF files looks like this:

spark.read\
  .format("com.databricks.vcf")\
  .option("includeSampleIds", True)\
  .option("flattenInfoFields", True)\
  .load("/databricks-datasets/genomics/1kg-vcfs")

The data types defined in the VCF header are translated to a schema for the output DataFrame. The VCF files in this example contain a number of annotations that become queryable fields:

The contents of a VCF file in a Spark SQL DataFrame

Fields that apply to each sample in a cohort—like the called genotype—are stored in an array, which enables fast aggregation for all samples at each site.

The array of per-sample genotype fields

As those who work with VCF files know all too well, the VCF specification leaves room for ambiguity in data formatting that can cause tools to fail in unexpected ways. We aimed to create a robust solution that was by default accepting of malformed records and then allow our users to choose filtering criteria. For instance, one of our customers used our reader to ingest problematic files where some probability values were stored as “nan” instead of “NaN”, which most Java-based tools require. Handling these simple issues automatically allows our users to focus on understanding what their data mean, not whether they are properly formatted. To verify the robustness of our reader, we have tested it against VCF files generated by common tools such as GATK and Edico Genomics as well as files from data sharing initiatives.

 

BGEN files such as those distributed by the UK Biobank initiative can be handled similarly. The code to read a BGEN file looks nearly identical to our VCF example:

spark.read.format("com.databricks.bgen").load(bgen_path)

These file readers produce compatible schemas that allow users to write pipelines that work for different sources of variation data and enable merging of different genomic datasets. For instance, the VCF reader can take a directory of files with differing INFO fields and return a single DataFrame that contains the common fields. The following commands read in data from BGEN and VCF files and merge them to create a single dataset:

vcf_df = spark.read.format(“com.databricks.vcf”).load(vcf_path)
bgen_df = spark.read.format(“com.databricks.bgen”)\
   .schema(vcf_df.schema).load(bgen_path)
big_df = vcf_df.union(bgen_df) # All my genotypes!!

Since our file readers return vanilla Spark SQL DataFrames, you can ingest variant data using any of the programming languages supported by Spark, like Python, R, Scala, Java, or pure SQL. Specialized frontend APIs such as Koalas, which implements the pandas dataframe API on Apache Spark, and sparklyr work seamlessly as well.

Manipulating genomic data

Since each variant-level annotation (the INFO fields in a VCF) corresponds to a DataFrame column, queries can easily access these values. For example, we can count the number of biallelic variants with minor allele frequency less than 0.05:

Spark 2.4 introduced higher-order functions that simplify queries over array data. We can take advantage of this feature to manipulate the array of genotypes. To filter the genotypes array so that it only contains samples with at least one variant allele, we can write a query like this:

Manipulating the genotypes array with higher order functions

 

If you have tabix indexes for your VCF files, our data source will push filters on genomic locus to the index and minimize I/O costs. Even as datasets grow beyond the size that a single machine can support, simple queries still complete at interactive speeds.

As we mentioned when we discussed ingesting variation data, any language that Spark supports can be used to write queries. The above statements can be combined into a single SQL query:

Querying a VCF file with SQL

 

Exporting data

We believe that in the near future, organizations will store and manage their genomic data just as they do with other data types, using technologies like Delta Lake. However, we understand that it’s important to have backward compatibility with familiar file formats for sharing with collaborators or working with legacy tools.

We can build on our filtering example to create a block gzipped VCF file that contains all variants with allele frequency less than 5%:

df.where(fx.expr("INFO_AF[0] < 0.05"))\
    .orderBy(“contigName”, “start”)\
    .write.format(“com.databricks.bigvcf”)\
    .save(“output.vcf.bgz”)

This command sorts, serializes, and uploads each segment of the output VCF in parallel, so you can safely output cohort-scale VCFs. It’s also possible to export one VCF per chromosome or on even smaller granularities.

Saving the same data to a BGEN file requires only one small modification to the code:

df.where(fx.expr("INFO_AF[0] < 0.05"))\
    .orderBy(“contigName”, “start”)\
    .write.format(“com.databricks.bigbgen”)\
    .save(“output.bgen”)

What’s next

Ingesting data into Spark is the first step of most big data pipelines, but it’s hardly the end of the journey. In the next few weeks, we’ll have more blog posts that demonstrate how features built on top of these readers and writers can scale and simplify genomic workloads. Stay tuned!

Try it!

Our Spark SQL readers make it easy to ingest large variation datasets with a small amount of code (Azure | AWS). Learn more about our genomics solutions in the Databricks Unified Analytics for Genomics and try out a preview today.

 

--

Try Databricks for free. Get started today.

The post Scaling Genomic Workflows with Spark SQL BGEN and VCF Readers appeared first on Databricks.

Continue Reading…

Collapse

Read More

10 Gradient Descent Optimisation Algorithms + Cheat Sheet

Gradient descent is an optimization algorithm used for minimizing the cost function in various ML algorithms. Here are some common gradient descent optimisation algorithms used in the popular deep learning frameworks such as TensorFlow and Keras.

Continue Reading…

Collapse

Read More

Linear Algebra for Machine Learning and Deep Learning in R

(This article was first published on R Programming – DataScience+, and kindly contributed to R-bloggers)

Category

Tags

In this article, you learn how to do linear algebra for Machine Learning and Deep Learning in R. In particular, I will discuss: Matrix Multiplication, Solve System of Linear Equations, Identity Matrix, Matrix Inverse, Solve System of Linear Equations Revisited, Finding the Determinant, Matrix Norm, Frobenius Norm, Special Matrices and Vectors, Eigendecomposition, Singular Value Decomposition, Moore-Penrose Pseudoinverse, and Matrix Trace.

Introduction

Linear algebra is a branch of mathematics that is widely used throughout data science. Yet because linear algebra is a form of continuous rather than discrete mathematics, many data scientists have little experience with it. A good understanding of linear algebra is essential for understanding and working with machine learning and deep learning algorithms. This article is particularly aimed at linear algebra for these two econometrical/statistical disciplines. Let us dive into the world of linear algebra for machine learning and deep learning with R:

Matrix Multiplication

Let us start by creating a Matrix Multiplication in R:

A <- matrix(data = 1:36, nrow = 6)
A
      [,1] [,2] [,3] [,4] [,5] [,6]
 [1,]    1    7   13   19   25   31
 [2,]    2    8   14   20   26   32
 [3,]    3    9   15   21   27   33
 [4,]    4   10   16   22   28   34
 [5,]    5   11   17   23   29   35
 [6,]    6   12   18   24   30   36

B <- matrix(data = 1:30, nrow = 6)
B
      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    7   13   19   25
 [2,]    2    8   14   20   26
 [3,]    3    9   15   21   27
 [4,]    4   10   16   22   28
 [5,]    5   11   17   23   29
 [6,]    6   12   18   24   30

A %*% B
[,1] [,2] [,3] [,4] [,5]
 [1,]  441 1017 1593 2169 2745
 [2,]  462 1074 1686 2298 2910
 [3,]  483 1131 1779 2427 3075
 [4,]  504 1188 1872 2556 3240
 [5,]  525 1245 1965 2685 3405
 [6,]  546 1302 2058 2814 3570

Hadamard Multiplication

Let us try creating a Hadamard Multiplication in R:

A <- matrix(data = 1:36, nrow = 6)
A
[,1] [,2] [,3] [,4] [,5] [,6]
 [1,]    1    7   13   19   25   31
 [2,]    2    8   14   20   26   32
 [3,]    3    9   15   21   27   33
 [4,]    4   10   16   22   28   34
 [5,]    5   11   17   23   29   35
 [6,]    6   12   18   24   30   36

B <- matrix(data = 11:46, nrow = 6)
B
      [,1] [,2] [,3] [,4] [,5] [,6]
 [1,]   11   17   23   29   35   41
 [2,]   12   18   24   30   36   42
 [3,]   13   19   25   31   37   43
 [4,]   14   20   26   32   38   44
 [5,]   15   21   27   33   39   45
 [6,]   16   22   28   34   40   46

A * B
[,1] [,2] [,3] [,4] [,5] [,6]
 [1,]   11  119  299  551  875 1271
 [2,]   24  144  336  600  936 1344
 [3,]   39  171  375  651  999 1419
 [4,]   56  200  416  704 1064 1496
 [5,]   75  231  459  759 1131 1575
 [6,]   96  264  504  816 1200 1656

Dot Product

Let us now create Dot Product in R:

X <- matrix(data = 1:10, nrow = 10)
X
[,1]
  [1,]    1
  [2,]    2
  [3,]    3
  [4,]    4
  [5,]    5
  [6,]    6
  [7,]    7
  [8,]    8
  [9,]    9
 [10,]   10

Y <- matrix(data = 11:20, nrow = 10)
Y

       [,1]
  [1,]   11
  [2,]   12
  [3,]   13
  [4,]   14
  [5,]   15
  [6,]   16
  [7,]   17
  [8,]   18
  [9,]   19
 [10,]   20

Let us now create Dot Product function in R:

dotProduct <- function(X, Y) {
    as.vector(t(X) %*% Y)
}
dotProduct(X, Y)

 [1] 935

Properties of Matrix Multiplication

Let us look at Properties of Matrix Multiplication in R:

#1 Matrix Property: It is Distributive Matrix


A <- matrix(data = 1:25, nrow = 5)
B <- matrix(data = 26:50, nrow = 5)
C <- matrix(data = 51:75, nrow = 5)

A %*% (B + C)

      [,1] [,2] [,3] [,4] [,5]
 [1,] 4555 5105 5655 6205 6755
 [2,] 4960 5560 6160 6760 7360
 [3,] 5365 6015 6665 7315 7965
 [4,] 5770 6470 7170 7870 8570
 [5,] 6175 6925 7675 8425 9175

A %*% B + A %*% C

      [,1] [,2] [,3] [,4] [,5]
 [1,] 4555 5105 5655 6205 6755
 [2,] 4960 5560 6160 6760 7360
 [3,] 5365 6015 6665 7315 7965
 [4,] 5770 6470 7170 7870 8570
 [5,] 6175 6925 7675 8425 9175

#2 Matrix Property: It is Associative Matrix

A <- matrix(data = 1:25, nrow = 5)
B <- matrix(data = 26:50, nrow = 5)
C <- matrix(data = 51:75, nrow = 5)
 
A %*% B) %*% C

        [,1]   [,2]   [,3]   [,4]    [,5]
 [1,] 569850 623350 676850 730350  783850
 [2,] 620450 678700 736950 795200  853450
 [3,] 671050 734050 797050 860050  923050
 [4,] 721650 789400 857150 924900  992650
 [5,] 772250 844750 917250 989750 1062250

 
A %*% (B %*% C)

        [,1]   [,2]   [,3]   [,4]    [,5]
 [1,] 569850 623350 676850 730350  783850
 [2,] 620450 678700 736950 795200  853450
 [3,] 671050 734050 797050 860050  923050
 [4,] 721650 789400 857150 924900  992650
 [5,] 772250 844750 917250 989750 1062250

#3 Matrix Property: It is Not Commutative Matrix

A <- matrix(data = 1:25, nrow = 5)
B <- matrix(data = 26:50, nrow = 5)
A %*% B

      [,1] [,2] [,3] [,4] [,5]
 [1,] 1590 1865 2140 2415 2690
 [2,] 1730 2030 2330 2630 2930
 [3,] 1870 2195 2520 2845 3170
 [4,] 2010 2360 2710 3060 3410
 [5,] 2150 2525 2900 3275 3650

B %*% A

      [,1] [,2] [,3] [,4] [,5]
 [1,]  590 1490 2390 3290 4190
 [2,]  605 1530 2455 3380 4305
 [3,]  620 1570 2520 3470 4420
 [4,]  635 1610 2585 3560 4535
 [5,]  650 1650 2650 3650 4650

Matrix Transpose

Let us look at Matrix Transpose in R:

A <- matrix(data = 1:25, nrow = 5, ncol = 5, byrow = TRUE)
A

      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    2    3    4    5
 [2,]    6    7    8    9   10
 [3,]   11   12   13   14   15
 [4,]   16   17   18   19   20
 [5,]   21   22   23   24   25

t(A)

      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    6   11   16   21
 [2,]    2    7   12   17   22
 [3,]    3    8   13   18   23
 [4,]    4    9   14   19   24
 [5,]    5   10   15   20   25

Let us look at Matrix Transpose Property in R:

A <- matrix(data = 1:25, nrow = 5)
B <- matrix(data = 25:49, nrow = 5)
t(A %*% B)

      [,1] [,2] [,3] [,4] [,5]
 [1,] 1535 1670 1805 1940 2075
 [2,] 1810 1970 2130 2290 2450
 [3,] 2085 2270 2455 2640 2825
 [4,] 2360 2570 2780 2990 3200
 [5,] 2635 2870 3105 3340 3575

t(B) %*% t(A)

      [,1] [,2] [,3] [,4] [,5]
 [1,] 1535 1670 1805 1940 2075
 [2,] 1810 1970 2130 2290 2450
 [3,] 2085 2270 2455 2640 2825
 [4,] 2360 2570 2780 2990 3200
 [5,] 2635 2870 3105 3340 3575

Solve System of Linear Equations

Now let us Solve System of Linear Equations in R:

Ax = B
A <- matrix(data = c(1, 3, 2, 4, 2, 4, 3, 5, 1, 6, 7, 2, 1, 5, 6, 7), nrow = 4, byrow = TRUE)
A

      [,1] [,2] [,3] [,4]
 [1,]    1    3    2    4
 [2,]    2    4    3    5
 [3,]    1    6    7    2
 [4,]    1    5    6    7

B <- matrix(data = c(1, 2, 3, 4), nrow = 4)
B

      [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4

solve(a = A, b = B)

            [,1]
 [1,]  0.6153846
 [2,] -0.8461538
 [3,]  1.0000000
 [4,]  0.2307692

Identity Matrix

Now let us Solve Identity Matrix in R:

I <- diag(x = 1, nrow = 5, ncol = 5)
I

      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    0    0    0    0
 [2,]    0    1    0    0    0
 [3,]    0    0    1    0    0
 [4,]    0    0    0    1    0
 [5,]    0    0    0    0    1

A <- matrix(data = 1:25, nrow = 5)
A %*% I

      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    6   11   16   21
 [2,]    2    7   12   17   22
 [3,]    3    8   13   18   23
 [4,]    4    9   14   19   24
 [5,]    5   10   15   20   25

I %*% A

      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    6   11   16   21
 [2,]    2    7   12   17   22
 [3,]    3    8   13   18   23
 [4,]    4    9   14   19   24
 [5,]    5   10   15   20   25

Matrix Inverse

Now let us Solve Matrix Inverse in R:

A <- matrix(data = c(1, 2, 3, 1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 3), nrow = 5)
A

      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    3    3    8    4
 [2,]    2    4    4    9    5
 [3,]    3    5    5    1    6
 [4,]    1    6    6    2    7
 [5,]    2    2    7    3    3

library(MASS)
ginv(A)

            [,1]       [,2]       [,3]          [,4]          [,5]
 [1,] -0.3333333  0.3333333  0.3333333 -3.333333e-01  1.040834e-16
 [2,] -4.0888889  3.6444444 -1.2222222  8.666667e-01 -2.000000e-01
 [3,] -0.3555556  0.2444444 -0.2222222  1.333333e-01  2.000000e-01
 [4,] -0.1111111  0.2222222 -0.1111111 -6.938894e-18  2.602085e-18
 [5,]  3.8888889 -3.4444444  1.2222222 -6.666667e-01 -2.664535e-15

ginv(A) %*% A

               [,1]          [,2]          [,3]          [,4]          [,5]
 [1,]  1.000000e+00 -6.800116e-16 -1.595946e-16  9.020562e-17 -1.020017e-15
 [2,]  8.881784e-16  1.000000e+00 -5.329071e-15 -1.287859e-14 -2.464695e-14
 [3,] -1.665335e-16 -1.387779e-15  1.000000e+00 -1.332268e-15 -1.998401e-15
 [4,] -2.237793e-16 -8.135853e-16 -8.005749e-16  1.000000e+00 -1.262011e-15
 [5,]  0.000000e+00  1.953993e-14  6.217249e-15  1.265654e-14  1.000000e+00

A %*% ginv(A)

               [,1]         [,2]          [,3]         [,4]          [,5]
 [1,]  1.000000e+00 1.776357e-15 -1.776357e-15 2.220446e-15 -1.200429e-15
 [2,] -7.105427e-15 1.000000e+00 -1.776357e-15 1.776357e-15 -5.316927e-16
 [3,] -3.552714e-15 0.000000e+00  1.000000e+00 1.776357e-15  1.136244e-16
 [4,]  0.000000e+00 0.000000e+00  0.000000e+00 1.000000e+00  5.204170e-18
 [5,] -5.329071e-15 5.329071e-15 -8.881784e-16 1.998401e-15  1.000000e+00

Solve System of Linear Equations Revisited

Let us look at Solve System of Linear Equations Revisited in R:

A <- matrix(data = c(1, 3, 2, 4, 2, 4, 3, 5, 1, 6, 7, 2, 1, 5, 6, 7), nrow = 4, byrow = TRUE)
A

      [,1] [,2] [,3] [,4]
 [1,]    1    3    2    4
 [2,]    2    4    3    5
 [3,]    1    6    7    2
 [4,]    1    5    6    7

B <- matrix(data = c(1, 2, 3, 4), nrow = 4)
B

      [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4

library(MASS)
X <- ginv(A) %*% B
X

            [,1]
 [1,]  0.6153846
 [2,] -0.8461538
 [3,]  1.0000000
 [4,]  0.2307692

Determinant

Let us find the Determinant Matrix in R:

A <- matrix(data = c(1, 3, 2, 4, 2, 4, 3, 5, 1, 6, 7, 2, 1, 5, 6, 7), nrow = 4, byrow = TRUE)
A

      [,1] [,2] [,3] [,4]
 [1,]    1    3    2    4
 [2,]    2    4    3    5
 [3,]    1    6    7    2
 [4,]    1    5    6    7

det(A)

 [1] -39

Matrix Norm

Let us now look at the Matrix Norm:

lpNorm = 1 & dim(A)[[2]] == 1 && is.infinite(p) == FALSE) {
            sum((apply(X = A, MARGIN = 1, FUN = abs)) ** p) ** (1 / p)
        
    } else if (p >= 1 & dim(A)[[2]] == 1 & is.infinite(p)) {
        max(apply(X = A, MARGIN = 1, FUN = abs))  Max Norm
    } else {
        invisible(NULL)
    }
}
lpNorm(A = matrix(data = 1:10), p = 1)

 [1] 55

lpNorm(A = matrix(data = 1:10), p = 2)  
#Euclidean Distance 

 [1] 19.62142

lpNorm(A = matrix(data = 1:10), p = 3)

 [1] 14.46245

lpNorm(A = matrix(data = -100:10), p = Inf)

 [1] 100

Let us find Properties between Matrix Norm and the Determinant Matrix in R:

lpNorm(A = matrix(data = rep(0, 10)), p = 1) == 0

 [1] TRUE

lpNorm(A = matrix(data = 1:10) + matrix(data = 11:20), p = 1) <= lpNorm(A = matrix(data = 1:10), p = 1) + lpNorm(A = matrix(data = 11:20), p = 1)

 [1] TRUE

tempFunc <- function(i) {
    lpNorm(A = i * matrix(data = 1:10), p = 1) == abs(i) * lpNorm(A = matrix(data = 1:10), p = 1)   
}
all(sapply(X = -10:10, FUN = tempFunc))

 [1] TRUE

Frobenius Norm

The Frobenius norm, sometimes also called the Euclidean norm (a term unfortunately also used for the vector L^2-norm), is matrix norm of an m×n matrix A defined as the square root of the sum of the absolute squares of its elements. The Frobenius norm is the only one out of the above three matrix norms that is unitary invariant, i.e., it is conserved or invariant under a unitary transformation.
Let us find Frobenius Norm Matrix in R:

frobeniusNorm <- function(A) {
    (sum((as.numeric(A)) ** 2)) ** (1 / 2)
}
frobeniusNorm(A = matrix(data = 1:25, nrow = 5))

 [1] 74.33034

Special Matrices and Vectors

Let us look at Special Matrices and Vectors in R:

#1 Special Matrix: The Diagonal Matrix

A <- diag(x = c(1:5, 6, 1, 2, 3, 4), nrow = 10)

A
       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
  [1,]    1    0    0    0    0    0    0    0    0     0
  [2,]    0    2    0    0    0    0    0    0    0     0
  [3,]    0    0    3    0    0    0    0    0    0     0
  [4,]    0    0    0    4    0    0    0    0    0     0
  [5,]    0    0    0    0    5    0    0    0    0     0
  [6,]    0    0    0    0    0    6    0    0    0     0
  [7,]    0    0    0    0    0    0    1    0    0     0
  [8,]    0    0    0    0    0    0    0    2    0     0
  [9,]    0    0    0    0    0    0    0    0    3     0
 [10,]    0    0    0    0    0    0    0    0    0     4

X <- matrix(data = 21:30)
X

       [,1]
  [1,]   21
  [2,]   22
  [3,]   23
  [4,]   24
  [5,]   25
  [6,]   26
  [7,]   27
  [8,]   28
  [9,]   29
 [10,]   30

A %*% X

       [,1]
  [1,]   21
  [2,]   44
  [3,]   69
  [4,]   96
  [5,]  125
  [6,]  156
  [7,]   27
  [8,]   56
  [9,]   87
 [10,]  120

library(MASS)
ginv(A)

       [,1] [,2]      [,3] [,4] [,5]      [,6] [,7] [,8]      [,9] [,10]
  [1,]    1  0.0 0.0000000 0.00  0.0 0.0000000    0  0.0 0.0000000  0.00
  [2,]    0  0.5 0.0000000 0.00  0.0 0.0000000    0  0.0 0.0000000  0.00
  [3,]    0  0.0 0.3333333 0.00  0.0 0.0000000    0  0.0 0.0000000  0.00
  [4,]    0  0.0 0.0000000 0.25  0.0 0.0000000    0  0.0 0.0000000  0.00
  [5,]    0  0.0 0.0000000 0.00  0.2 0.0000000    0  0.0 0.0000000  0.00
  [6,]    0  0.0 0.0000000 0.00  0.0 0.1666667    0  0.0 0.0000000  0.00
  [7,]    0  0.0 0.0000000 0.00  0.0 0.0000000    1  0.0 0.0000000  0.00
  [8,]    0  0.0 0.0000000 0.00  0.0 0.0000000    0  0.5 0.0000000  0.00
  [9,]    0  0.0 0.0000000 0.00  0.0 0.0000000    0  0.0 0.3333333  0.00
 [10,]    0  0.0 0.0000000 0.00  0.0 0.0000000    0  0.0 0.0000000  0.25

#2 Special Matrix: The Symmetric Matrix

A <- matrix(data = c(1, 2, 2, 1), nrow = 2)
A

      [,1] [,2]
 [1,]    1    2
 [2,]    2    1

all(A == t(A))

 [1] TRUE

#3 Special Matrix: The Unit Vector

lpNorm(A = matrix(data = c(1, 0, 0, 0)), p = 2)

 [1] 1

#4 Special Matrix: Orthogonal Vectors

X <- matrix(data = c(11, 0, 0, 0))
Y <- matrix(data = c(0, 11, 0, 0))
all(t(X) %*% Y == 0)

 [1] TRUE

#4 Special Matrix: More Orthogonal Vectors

X <- matrix(data = c(1, 0, 0, 0))
Y <- matrix(data = c(0, 1, 0, 0))
lpNorm(A = X, p = 2) == 1

 [1] TRUE

lpNorm(A = Y, p = 2) == 1

 [1] TRUE

all(t(X) %*% Y == 0)

 [1] TRUE

#4 Special Matrix: Still More Orthogonal Vectors

A <- matrix(data = c(1, 0, 0, 0, 1, 0, 0, 0, 1), nrow = 3, byrow = TRUE)
A

      [,1] [,2] [,3]
 [1,]    1    0    0
 [2,]    0    1    0
 [3,]    0    0    1

all(t(A) %*% A == A %*% t(A))

 [1] TRUE

all(t(A) %*% A == diag(x = 1, nrow = 3))

 [1] TRUE

library(MASS)
all(t(A) == ginv(A))

 [1] TRUE

Eigendecomposition

Let us Now look at Eigendecomposition in R:

A <- matrix(data = 1:25, nrow = 5, byrow = TRUE)
A

      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    2    3    4    5
 [2,]    6    7    8    9   10
 [3,]   11   12   13   14   15
 [4,]   16   17   18   19   20
 [5,]   21   22   23   24   25

y <- eigen(x = A)
library(MASS)
all.equal(y$vectors %*% diag(y$values) %*% ginv(y$vectors), A)

 [1] TRUE

Singular Value Decomposition

Now let us look at Singular Value Decomposition in R:

A <- matrix(data = 1:36, nrow = 6, byrow = TRUE)
A

      [,1] [,2] [,3] [,4] [,5] [,6]
 [1,]    1    2    3    4    5    6
 [2,]    7    8    9   10   11   12
 [3,]   13   14   15   16   17   18
 [4,]   19   20   21   22   23   24
 [5,]   25   26   27   28   29   30
 [6,]   31   32   33   34   35   36

y <- svd(x = A)
y

 $d
 [1] 1.272064e+02 4.952580e+00 1.068280e-14 3.258502e-15 9.240498e-16
 [6] 6.865073e-16
 
 $u
             [,1]        [,2]       [,3]        [,4]        [,5]
 [1,] -0.06954892 -0.72039744  0.6716423 -0.11924367  0.08965916
 [2,] -0.18479698 -0.51096788 -0.6087484 -0.06762569  0.44007566
 [3,] -0.30004504 -0.30153832 -0.3722328 -0.09266448 -0.41109295
 [4,] -0.41529310 -0.09210875  0.0313011  0.21692481 -0.67511264
 [5,] -0.53054116  0.11732081  0.1308779  0.71086492  0.37490569
 [6,] -0.64578922  0.32675037  0.1471598 -0.64825589  0.18156508
             [,6]
 [1,] -0.05319067
 [2,]  0.36871061
 [3,] -0.70915885
 [4,]  0.56145739
 [5,] -0.20432734
 [6,]  0.03650885
 
 $v
            [,1]        [,2]        [,3]        [,4]       [,5]       [,6]
 [1,] -0.3650545  0.62493577  0.54215504  0.08199306 -0.1033873 -0.4060131
 [2,] -0.3819249  0.38648609 -0.23874067 -0.40371901  0.5758949  0.3913066
 [3,] -0.3987952  0.14803642 -0.75665994  0.29137287 -0.2722608 -0.2957858
 [4,] -0.4156655 -0.09041326  0.14938782 -0.18121587 -0.6652579  0.5668542
 [5,] -0.4325358 -0.32886294  0.21539167  0.69322385  0.3606549  0.2184882
 [6,] -0.4494062 -0.56731262  0.08846608 -0.48165491  0.1043562 -0.4748500

all.equal(y$u %*% diag(y$d) %*% t(y$v), A)

 [1] TRUE

Moore-Penrose Pseudoinverse

Now let us look at Moore-Penrose Pseudoinverse in R:

A <- matrix(data = 1:25, nrow = 5)
A

      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    6   11   16   21
 [2,]    2    7   12   17   22
 [3,]    3    8   13   18   23
 [4,]    4    9   14   19   24
 [5,]    5   10   15   20   25

B <- ginv(A)
B

        [,1]  [,2]         [,3]   [,4]   [,5]
 [1,] -0.152 -0.08 -8.00000e-03  0.064  0.136
 [2,] -0.096 -0.05 -4.00000e-03  0.042  0.088
 [3,] -0.040 -0.02 -9.97466e-18  0.020  0.040
 [4,]  0.016  0.01  4.00000e-03 -0.002 -0.008
 [5,]  0.072  0.04  8.00000e-03 -0.024 -0.056

y <- svd(A)
all.equal(y$v %*% ginv(diag(y$d)) %*% t(y$u), B)

 [1] TRUE

Trace Matrix in R

Lastly, let us look at Trace Matrix in R:

A <- diag(x = 1:10)
A

       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
  [1,]    1    0    0    0    0    0    0    0    0     0
  [2,]    0    2    0    0    0    0    0    0    0     0
  [3,]    0    0    3    0    0    0    0    0    0     0
  [4,]    0    0    0    4    0    0    0    0    0     0
  [5,]    0    0    0    0    5    0    0    0    0     0
  [6,]    0    0    0    0    0    6    0    0    0     0
  [7,]    0    0    0    0    0    0    7    0    0     0
  [8,]    0    0    0    0    0    0    0    8    0     0
  [9,]    0    0    0    0    0    0    0    0    9     0
 [10,]    0    0    0    0    0    0    0    0    0    10

library(psych)
tr(A)

 [1] 55

We can also code a function that calculates the Trace Matrix of the Frobenius Norm Matrix:

alternativeFrobeniusNorm <- function(A) {
    sqrt(tr(t(A) %*% A))
}
alternativeFrobeniusNorm(A)

 [1] 19.62142

frobeniusNorm(A)

 [1] 19.62142

all.equal(tr(A), tr(t(A)))

 [1] TRUE

Let us look at the diagonals (1:5) of the Trace Matrix of the Frobenius Norm Trace Matrix:

A <- diag(x = 1:5)
A

      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    0    0    0    0
 [2,]    0    2    0    0    0
 [3,]    0    0    3    0    0
 [4,]    0    0    0    4    0
 [5,]    0    0    0    0    5

Let us also look at the diagonals (6:10) of the Trace Matrix of the Frobenius Norm Trace Matrix:

B <- diag(x = 6:10)
B

      [,1] [,2] [,3] [,4] [,5]
 [1,]    6    0    0    0    0
 [2,]    0    7    0    0    0
 [3,]    0    0    8    0    0
 [4,]    0    0    0    9    0
 [5,]    0    0    0    0   10

Let us anow look at the diagonals (11:15) of the Trace Matrix of the Frobenius Norm Trace Matrix:

C <- diag(x = 11:15)
C

      [,1] [,2] [,3] [,4] [,5]
 [1,]   11    0    0    0    0
 [2,]    0   12    0    0    0
 [3,]    0    0   13    0    0
 [4,]    0    0    0   14    0
 [5,]    0    0    0    0   15

Let us alsos see if all conditions of the Frobenius Norm Trace Matrix are true:

all.equal(tr(A %*% B %*% C), tr(C %*% A %*% B))

 [1] TRUE

all.equal(tr(C %*% A %*% B), tr(B %*% C %*% A))

 [1] TRUE

Related Post

To leave a comment for the author, please follow the link and comment on their blog: R Programming – DataScience+.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Continue Reading…

Collapse

Read More

How does Stan work? A reading list.

Bob writes, to someone who is doing work on the Stan language:

The basic execution structure of Stan is in the JSS paper (by Bob Carpenter, Andrew Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell) and in the reference manual. The details of autodiff are in the arXiv paper (by Bob Carpenter, Matt Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt). These are sort of background for what we’re trying to do.

If you haven’t read Maria Gorinova’s MS thesis and POPL paper (with Andrew Gordon and Charles Sutton), you should probably start there.

Radford Neal’s intro to HMC is nice, as is the one in David McKay’s book. Michael Betancourt’s papers are the thing to read to understand HMC deeply—he just wrote another brain bender on geometric autodiff (all on arXiv). Starting with the one on hierarchical models would be good as it explains the necessity of reparameterizations.

Also I recommend our JEBS paper (with Daniel Lee, and Jiqiang Guo) as it presents Stan from a user’s rather than a developer’s perspective.

And, for more general background on Bayesian data analysis, we recommend Statistical Rethinking by Richard McElreath and BDA3.

Continue Reading…

Collapse

Read More

Quick Hit: Above the Fold; Hard wrapping text at ‘n’ characters

(This article was first published on R – rud.is, and kindly contributed to R-bloggers)

Despite being on holiday I’m getting in a bit of non-work R coding since the fam has a greater ability to sleep late than I do. Apart from other things I’ve been working on a PR into {lutz}, a package by @andyteucher that turns lat/lng pairs into timezone strings.

The package is super neat and has two modes: “fast” (originally based on a {V8}-backed version of @darkskyapp’s tzlookup javascript module) and “accurate” using R’s amazing spatial ops.

I ported the javascript algorithm to C++/Rcpp and have been tweaking the bit of package helper code that fetches this:

and extracts the embedded string tree and corresponding timezones array and turns both into something C++ can use.

Originally I just made a header file with the same long lines:

but that’s icky and fairly bad form, especially given that C++ will combine adjacent string literals for you.

The stringi::stri_wrap() function can easily take care of wrapping the time zone array elements for us:

but, I also needed the ability to hard-wrap the encoded string tree at a fixed width. There are lots of ways to do that, here are three of them:

library(Rcpp)
library(stringi)
library(tidyverse)
library(microbenchmark)

sourceCpp(code = "
#include 

// [[Rcpp::export]]
std::vector< std::string > fold_cpp(const std::string& input, int width) {

  int sz = input.length() / width;

  std::vector< std::string > out;
  out.reserve(sz); // shld make this more efficient

  for (unsigned long idx=0; idx<sz; );="" code="" input.substr(idx*width,="" nchar(input),="" (input.length()="" <-="" if="" !="0)" fold_tidy="" return(out);="" substr(input,="" map_chr(="" )="" +="" -="" width="" width)="" ")="" fun.value="character(1)" <="" 1),="" vapply(="" idx,="" out.push_back(input.substr(width*sz));="" ~stri_sub(input,="" out.push_back(="" %="" idx++)="" function(idx)="" seq(1,="" idx="" function(input,="" length="width)" width),="" .x,="" {="" }="" fold_base=""></sz;>

(If you know of a package that has this type of function def leave a note in the comments).

Each one does the same thing: move n sequences of width characters into a new slot in a character vector. Let’s see what they do with this toy long string example:

(src <- paste0(c(rep("a", 30), rep("b", 30), rep("c", 4)), collapse = ""))
## [1] "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccc"

for (n in c(1, 7, 30, 40)) {

  print(fold_base(src, n))
  print(fold_tidy(src, n))
  print(fold_cpp(src, n))
  cat("\n")

}
##  [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"
## [18] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "b" "b" "b" "b"
## [35] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
## [52] "b" "b" "b" "b" "b" "b" "b" "b" "b" "c" "c" "c" "c"
##  [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"
## [18] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "b" "b" "b" "b"
## [35] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
## [52] "b" "b" "b" "b" "b" "b" "b" "b" "b" "c" "c" "c" "c"
##  [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"
## [18] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "b" "b" "b" "b"
## [35] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
## [52] "b" "b" "b" "b" "b" "b" "b" "b" "b" "c" "c" "c" "c"
## 
##  [1] "aaaaaaa" "aaaaaaa" "aaaaaaa" "aaaaaaa" "aabbbbb" "bbbbbbb"
##  [7] "bbbbbbb" "bbbbbbb" "bbbbccc" "c"      
##  [1] "aaaaaaa" "aaaaaaa" "aaaaaaa" "aaaaaaa" "aabbbbb" "bbbbbbb"
##  [7] "bbbbbbb" "bbbbbbb" "bbbbccc" "c"      
##  [1] "aaaaaaa" "aaaaaaa" "aaaaaaa" "aaaaaaa" "aabbbbb" "bbbbbbb"
##  [7] "bbbbbbb" "bbbbbbb" "bbbbccc" "c"      
## 
## [1] "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
## [3] "cccc"                          
## [1] "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
## [3] "cccc"                          
## [1] "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
## [3] "cccc"                          
## 
## [1] "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbb"
## [2] "bbbbbbbbbbbbbbbbbbbbcccc"                
## [1] "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbb"
## [2] "bbbbbbbbbbbbbbbbbbbbcccc"                
## [1] "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbb"
## [2] "bbbbbbbbbbbbbbbbbbbbcccc"   

So, we know they all work, which means we can take a look at which one is faster. Let’s compare folding at various widths:

map_df(c(1, 3, 5, 7, 10, 20, 30, 40, 70), ~{
  microbenchmark(
    base = fold_base(src, .x),
    tidy = fold_tidy(src, .x),
    cpp = fold_cpp(src, .x)
  ) %>% 
    mutate(width = .x) %>% 
    as_tibble()
}) %>% 
  mutate(
    width = factor(width, 
                   levels = sort(unique(width)), 
                   ordered = TRUE)
  ) -> bench_df

ggplot(bench_df, aes(expr, time)) +
  ggbeeswarm::geom_quasirandom(
    aes(group = width, fill = width),
    groupOnX = TRUE, shape = 21, color = "white", size = 3, stroke = 0.125, alpha = 1/4
  ) +
  scale_y_comma(trans = "log10", position = "right") +
  coord_flip() +
  guides(
    fill = guide_legend(override.aes = list(alpha = 1))
  ) +
  labs(
    x = NULL, y = "Time (nanoseconds)",
    fill = "Split width:", 
    title = "Performance comparison between 'fold' implementations"
  ) +
  theme_ft_rc(grid="X") +
  theme(legend.position = "top")

ggplot(bench_df, aes(width, time)) +
  ggbeeswarm::geom_quasirandom(
    aes(group = expr, fill = expr),
    groupOnX = TRUE, shape = 21, color = "white", size = 3, stroke = 0.125, alpha = 1/4
  ) +
  scale_x_discrete(
    labels = c(1, 3, 5, 7, 10, 20, 30, 40, "Split/fold width: 70")
  ) +
  scale_y_comma(trans = "log10", position = "right") +
  scale_fill_ft() +
  coord_flip() +
  guides(
    fill = guide_legend(override.aes = list(alpha = 1))
  ) +
  labs(
    x = NULL, y = "Time (nanoseconds)",
    fill = NULL,
    title = "Performance comparison between 'fold' implementations"
  ) +
  theme_ft_rc(grid="X") +
  theme(legend.position = "top")

The Rcpp version is both faster and more consistent than the other two implementations (though they get faster as the number of string subsetting operations decrease); but, they’re all pretty fast. For an infrequently run process, it might be better to use the base R version purely for simplicity. Despite that fact, I used the Rcpp version to turn the string tree long line into:

FIN

If you have need to “fold” like this how do you currently implement your solution? Found a bug or better way after looking at the code? Drop a note in the comments so you can help others find an optimal solution to their own ‘fold’ing problems.

<script type="text/javascript"> var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); </script>

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Continue Reading…

Collapse

Read More

Can random forest provide insights into how yeast grows?

(This article was first published on R – What You're Doing Is Rather Desperate, and kindly contributed to R-bloggers)

I’m not saying this is a good idea, but bear with me.

A recent question on Stack Overflow [r] asked why a random forest model was not working as expected. The questioner was working with data from an experiment in which yeast was grown under conditions where (a) the growth rate could be controlled and (b) one of 6 nutrients was limited. Their dataset consisted of 6 rows – one per nutrient – and several thousand columns, with values representing the activity (expression) of yeast genes. Could the expression values be used to predict the limiting nutrient?

The random forest was not working as expected: not one of the nutrients was correctly classified. I pointed out that with only one case for each outcome, this was to be expected – as the random forest algorithm samples a proportion of the rows, no correct predictions are likely in this case. As sometimes happens the question was promptly deleted, which was unfortunate as we could have further explored the problem.

A little web searching revealed that the dataset in question is quite well-known. It’s published in an article titled Coordination of Growth Rate, Cell Cycle, Stress Response, and Metabolic Activity in Yeast and has been transformed into a “tidy” format for use in tutorials, here and here.

As it turns out, there are 6 cases (rows) for each outcome (limiting nutrient), as experiments were performed at 6 different growth rates. Whilst random forests are good for “large p small n” problems, it may be that ~ 5 500 x 36 is pushing the limit somewhat. But you know – let’s just try it anyway.

As ever, code and a report for this blog post can be found at Github.

First, we obtain the tidied Brauer dataset. But in fact we want to “untidy” it again (make wide from long) because for random forest we want observations (n) as rows and variables (p – both predicted and predictors) in columns.

library(tidyverse)
library(randomForest)
library(randomForestExplainer)

brauer2007_tidy <- read_csv("https://4va.github.io/biodatasci/data/brauer2007_tidy.csv")

I’ll show the code for running a random forest without much explanation. I’ll assume you have a basic understanding of how the process works. That is: rows are sampled from the data and used to build many decision trees where variables predict either a continuous outcome variable (regression) or a categorical outcome (classification). The trees are then averaged (for regression values) or the majority vote is taken (for classification) to generate predictions. Individual predictors have an “importance” which is essentially some measure of how much worse the model would be were they not included.

Here we go then. A couple of notes. First, setting a seed for random sampling would not usually be used; it’s for reproducibility here. Second, unless you are specifically using the model to predict outcomes on unseen data, there’s no real need for splitting the data into test and training sets – the procedure is already performing a bootstrap by virtue of the out-of-bag error estimation.

brauer2007_tidy_rf1 <- brauer2007_tidy %>% 
  mutate(systematic_name = gsub("-", "minus", systematic_name), 
         nutrient = factor(nutrient)) %>% 
  select(systematic_name, nutrient, rate, expression) %>% 
  spread(systematic_name, expression, fill = 0) %>% 
  randomForest(nutrient ~ ., data = ., localImp = TRUE, importance = TRUE)

The model seems to have performed quite well:

Call:
 randomForest(formula = nutrient ~ ., data = ., localImp = TRUE,      importance = TRUE) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 74

        OOB estimate of  error rate: 5.56%
Confusion matrix:
          Ammonia Glucose Leucine Phosphate Sulfate Uracil class.error
Ammonia         6       0       0         0       0      0   0.0000000
Glucose         0       6       0         0       0      0   0.0000000
Leucine         0       1       5         0       0      0   0.1666667
Phosphate       0       0       0         6       0      0   0.0000000
Sulfate         0       0       0         0       6      0   0.0000000
Uracil          0       1       0         0       0      5   0.1666667

Let’s look at the expression of the top 20 genes (by variable importance), with respect to growth rate and limiting nutrient.

brauer2007_tidy %>% 
  filter(systematic_name %in% important_variables(brauer2007_tidy_rf1, k = 20)) %>% 
  ggplot(aes(rate, expression)) + 
  geom_line(aes(color = nutrient)) + 
  facet_wrap(~systematic_name, ncol = 5) + 
  scale_color_brewer(palette = "Set2")

Several of those look promising. Let’s select a gene where expression seems to be affected by each of the 6 limiting nutrients and search online resources such as the Saccharomyces Genome Database to see what’s known.

systematic_name bp nutrient search_results
YHR208W branched chain family amino acid biosynthesis* leucine Pathways – leucine biosynthesis
YKL216W ‘de novo’ pyrimidine base biosynthesis uracil URA1 – null mutant requires uracil
YLL055W biological process unknown sulfate Cysteine transporter; null mutant absent utilization of sulfur source
YLR108C biological process unknown phosphate
YOR348C proline catabolism* ammonia Proline permease; repressed in ammonia-grown cells
YOR374W ethanol metabolism glucose Aldehyde dehydrogenase; expression is glucose-repressed

I’d say the Brauer data “makes sense” for five of those genes. Little is known about the sixth (YLR108C, affected by phosphate limitation).

In summary

Normally, a study like this would start with the genes – identify those that are differentially expressed and then think about the conditions under which differential expression was observed. Here, the process is reversed in a sense: we view the experimental condition as an outcome, rather than a parameter and ask whether it can be “predicted” by other observations.

So whilst not my first choice of method for this kind of study, and despite limited outcome data, random forest does seem to be generating some insights into which genes are affected by nutrient limitation. And at the end of the day: if a method provides insights, isn’t that what data science is for?

To leave a comment for the author, please follow the link and comment on their blog: R – What You're Doing Is Rather Desperate.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Continue Reading…

Collapse

Read More

Why do we need AWS SageMaker?

Today, there are several platforms available in the industry that aid software developers, data scientists as well as a layman in developing and deploying machine learning models within no time.

Continue Reading…

Collapse

Read More

Four short links: 26 June 2019

Ethics and OKRs, Rewriting Binaries, Diversity of Implementation, and Uber's Metrics Systems

  1. Ethical Principles and OKRs -- Your KPIs can’t conflict with your principles if you don’t have principles. (So, start by defining your principles, then consider your principles before optimizing a KPI, monitor user experience to see if you're compromising your principles, and repeat.) (via Peter Skomoroch)
  2. RetroWrite -- Retrofitting compiler passes through binary rewriting. Paper. The ideal solution for binary security analysis would be a static rewriter that can intelligently add the required instrumentation as if it were inserted at compile time. Such instrumentation requires an analysis to statically disambiguate between references and scalars, a problem known to be undecidable in the general case. We show that recovering this information is possible in practice for the most common class of software and libraries: 64-bit, position independent code (via Mathias Payer)
  3. Re: A libc in LLVM -- very thoughtful post from a libc maintainer about the risks if Google implements an LLVM libc. Avoiding monoculture preserves the motivation for consensus-based standards processes rather than single-party control (see also: Chrome and what it's done to the web) and the motivation for people writing software to write to the standards rather than to a particular implementation.
  4. M3 and M3DB -- M3, a metrics platform, and M3DB, a distributed time series database, were developed at Uber out of necessity. After using what was available as open source and finding we were unable to use them at our scale due to issues with their reliability, cost, and operationally intensive nature, we built our own metrics platform piece by piece. We used our experience to help us build a native distributed time series database, a highly dynamic and performant aggregation service, query engine, and other supporting infrastructure.

Continue reading Four short links: 26 June 2019.

Continue Reading…

Collapse

Read More

KDnuggets™ News 19:n24, Jun 26: Understand Cloud Services; Pandas Tips & Tricks; Master Data Preparation w/ Python

Happy summer! This week on KDnuggets: Understanding Cloud Data Services; How to select rows and columns in Pandas using [ ], .loc, iloc, .at and .iat; 7 Steps to Mastering Data Preparation for Machine Learning with Python; Examining the Transformer Architecture: The OpenAI GPT-2 Controversy; Data Literacy: Using the Socratic Method; and much more!

Continue Reading…

Collapse

Read More

AI and machine learning will require retraining your entire organization

To successfully integrate AI and machine learning technologies, companies need to take a more holistic approach toward training their workforce.

In our recent surveys AI Adoption in the Enterprise and Machine Learning Adoption in the Enterprise, we found growing interest in AI technologies among companies across a variety of industries and geographic locations. Our findings align with other surveys and studies—in fact, a recent study by the World Intellectual Patent Office (WIPO) found that the surge in research in AI and machine learning (ML) has been accompanied by an even stronger growth in AI-related patent applications. Patents are one sign that companies are beginning to take these technologies very seriously.

Figure 1. A 2019 WIPO Study shows an over six-fold increase in AI patent publications from 2006 to 2017. Image source: Ben Lorica.

When we asked what held back their adoption of AI technologies, respondents cited a few reasons, including some that pertained to culture, organization, and skills:

  • [23%] Company culture does not yet recognize needs for AI
  • [18%] Lack of skilled people / difficulting hiring the required roles
  • [17%] Difficulties in identifying appropriate business use cases

Implementing and incorporating AI and machine learning technologies will require retraining across an organization, not just technical teams. Recall that the rise of big data and data science necessitated a certain amount of retraining across an entire organization: technologists and analysts needed to familiarize themselves with new tools and architectures, but business experts and managers also needed to reorient their workflows to adjust to data-driven processes and data-intensive systems. AI and machine learning will require a similar holistic approach to training. Here are a few reasons why:

  • As noted from our survey, identifying appropriate business use cases remains an ongoing challenge. Domain experts and business owners need to develop an understanding of these technologies in order to be able to highlight areas where they are likely to make an impact within a company.
  • Members of an organization will need to understand—even at a high-level—the current state of AI and ML technologies so they know the strengths and limitations of these new tools. For instance, in the case of robotic process automation (RPA), it’s really the people closest to tasks (“bottoms up”) who can best identify areas where it is most suitable.
  • AI and machine learning depend on data (usually labeled training data for machine learning models), and in many instances, a certain amount of domain knowledge will be needed to assemble high-quality data.
  • Machine learning and AI involve end-to-end pipelines, so development/testing/integration will often cut across technical roles and technical teams.
  • AI and machine learning applications and solutions often interact with (or augment) users and domain experts, so UX/design remains critical.
  • Security, privacy, ethics, and other risk and compliance issues will increasingly require that companies set up cross-functional teams when they build AI and machine learning systems and products.

At our upcoming Artificial Intelligence conferences in San Jose and London, we have assembled a roster of two-day training sessions, tutorials, and presentations to help individuals (across job roles and functions) sharpen their skills and understanding of AI and machine learning. We return to San Jose with a two-day Business Summit designed specifically for executives, business leaders, and strategists. This Business Summit includes a popular two-day training—AI for Managers—and tutorials—Bringing AI into the enterprise and Design Thinking for AI—along with 12 executive briefings designed to provide in-depth overviews into important topics in AI. We are also debuting a new half-day tutorial that will be taught by Ira Cohen (Product management in the Machine Learning era), which given the growing importance of AI and ML, is one that every manager should consider attending.

We will also have our usual strong slate of technical training, tutorials, and talks. Here are some two-day training sessions and tutorials that I am excited about:

AI and ML are going to impact and permeate most aspects of a company’s operations, products, and services. To succeed in implementing and incorporating AI and machine learning technologies, companies need to take a more holistic approach toward retraining their workforces. This will be an ongoing endeavor as research results continue to be translated into practical systems that companies can use. Individuals will need to continue to learn new skills as technologies continue to evolve and because many areas of AI and ML are increasingly becoming democratized.

Related training and tutorial links:

Continue reading AI and machine learning will require retraining your entire organization.

Continue Reading…

Collapse

Read More

Whats new on arXiv – Complete List

Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning
Innovating HR Using an Expert System for Recruiting IT Specialists — ESRIT
Deep Learning for Spatio-Temporal Data Mining: A Survey
From Fully Supervised to Zero Shot Settings for Twitter Hashtag Recommendation
Recurrent U-Net for Resource-Constrained Segmentation
Power Gradient Descent
Table-Based Neural Units: Fully Quantizing Networks for Multiply-Free Inference
The König Graph Process
Position-aware Graph Neural Networks
Medium-Term Load Forecasting Using Support Vector Regression, Feature Selection, and Symbiotic Organism Search Optimization
ADASS: Adaptive Sample Selection for Training Acceleration
UnLimited TRAnsfers for Multi-Modal Route Planning: An Efficient Solution
Weighted, Bipartite, or Directed Stream Graphs for the Modeling of Temporal Networks
A Closer Look at the Optimization Landscapes of Generative Adversarial Networks
Reinforcement Learning for Integer Programming: Learning to Cut
On regularization for a convolutional kernel in neural networks
Communication-Efficient Accurate Statistical Estimation
Model Testing for Generalized Scalar-on-Function Linear Models
Does BLEU Score Work for Code Migration?
Non-Parametric Calibration for Classification
Relative Hausdorff Distance for Network Analysis
Pay Attention to Convolution Filters: Towards Fast and Accurate Fine-Grained Transfer Learning
Neural Variational Inference For Estimating Uncertainty in Knowledge Graph Embeddings
Real-time Attention Based Look-alike Model for Recommender System
Secure Federated Matrix Factorization
Kaskade: Graph Views for Efficient Graph Analytics
Polynomial root clustering and explicit deflation
Trip Table Estimation and Prediction for Dynamic Traffic Assignment Applications
Generalized Langevin equations for systems with local interactions
Efficient Graph Rewriting
Phase-field material point method for dynamic brittle fracture with isotropic and anisotropic surface energy
Cooper pairing of incoherent electrons: an electron-phonon version of the Sachdev-Ye-Kitaev model
Joint 3D Localization and Classification of Space Debris using a Multispectral Rotating Point Spread Function
Unsupervised Discovery of Gendered Language through Latent-Variable Modeling
PerspectroScope: A Window to the World of Diverse Perspectives
Deep 2FBSDEs for Systems with Control Multiplicative Noise
Estimating the Number of Fatal Victims of the Peruvian Internal Armed Conflict, 1980-2000: an application of modern multi-list Capture-Recapture techniques
The Prolog debugger and declarative programming
Enumerating linear systems on graphs
Deep Forward-Backward SDEs for Min-max Control
A Systematic Comparison of English Noun Compound Representations
Issues with post-hoc counterfactual explanations: a discussion
Distribution-Free Multisample Test Based on Optimal Matching with Applications to Single Cell Genomics
Generating Pareto optimal dose distributions for radiation therapy treatment planning
Path Cohomology of Locally Finite Digraphs,Hodge’s Theorem and the $p$-Lazy Random Walk
Second-best Beam-Alignment via Bayesian Multi-Armed Bandits
Stability of Graph Scattering Transforms
Stein’s method and the distribution of the product of zero mean correlated normal random variables
Mathematical and numerical analysis of a nonlocal Drude model in nanoplasmonics
Optimum LoRaWAN Configuration Under Wi-SUN Interference
Active Distribution Grids offering Ancillary Services in Islanded and Grid-connected Mode
A note on extensions of multilinear maps defined on multilinear varieties
Suppressing Model Overfitting for Image Super-Resolution Networks
Lyapunov Differential Equation Hierarchy and Polynomial Lyapunov Functions for Switched Linear Systems
The EAS approach for graphical selection consistency in vector autoregression models
Towards Inverse Reinforcement Learning for Limit Order Book Dynamics
Generalized Beta Prime Distribution: Stochastic Model of Economic Exchange and Properties of Inequality Indices
Weakly-supervised Compositional FeatureAggregation for Few-shot Recognition
Relaxed random walks at scale
Unmasking Bias in News
Toward Best Practices for Explainable B2B Machine Learning
Edge-Direct Visual Odometry
Similarity Problems in High Dimensions
A mnotone data augmentation algorithm for longitudinal data analysis via multivariate skew-t, skew-normal or t distributions
Discrepancy, Coresets, and Sketches in Machine Learning
Task-Aware Deep Sampling for Feature Generation
Modified log-Sobolev inequality for a compact PJMP with degenerate jumps
Fast Trajectory Optimization via Successive Convexification for Spacecraft Rendezvous with Integer Constraints
Homological Connectivity in Čech Complexes
Statistical guarantees for local graph clustering
Semi-flat minima and saddle points by embedding neural networks to overparameterization
An ultraweak formulation of the Reissner-Mindlin plate bending model and DPG approximation
Nearly Finitary Matroids
Spectral Ratio for Positive Matrices
Visual Relationships as Functions: Enabling Few-Shot Scene Graph Prediction
Analytic-geometric methods for finite Markov chains with applications to quasi-stationarity
Gambler’s ruin estimates on finite inner uniform domains
Multiple instance learning with graph neural networks
Run-Time Efficient RNN Compression for Inference on Edge Devices
Using Small Proxy Datasets to Accelerate Hyperparameter Search
Adaptive Navigation Scheme for Optimal Deep-Sea Localization Using Multimodal Perception Cues
High-resolution Markov state models for the dynamics of Trp-cage miniprotein constructed over slow folding modes identified by state-free reversible VAMPnets
Compressive Hyperspherical Energy Minimization
Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks
Coresets for Gaussian Mixture Models of Any Shape
Prophet Inequalities on the Intersection of a Matroid and a Graph
Improving Importance Weighted Auto-Encoders with Annealed Importance Sampling
SPoC: Search-based Pseudocode to Code
All-Weather Deep Outdoor Lighting Estimation
Bilateral Boundary Control Design for a Cascaded Diffusion-ODE System Coupled at an Arbitrary Interior Point
Lifestate: Event-Driven Protocols and Callback Control Flow
Transferrable Operative Difficulty Assessment in Robot-assisted Teleoperation: A Domain Adaptation Approach
Partial Or Complete, That’s The Question
CogCompTime: A Tool for Understanding Time in Natural Language Text
Joint Reasoning for Temporal and Causal Relations
A Structured Learning Approach to Temporal Relation Extraction
Semi-Supervised Exploration in Image Retrieval
A Stratified Approach to Robustness for Randomly Smoothed Classifiers
Hand Orientation Estimation in Probability Density Form
Towards Geocoding Spatial Expressions
Synthesizing Diverse Lung Nodules Wherever Massively: 3D Multi-Conditional GAN-based CT Image Augmentation for Object Detection
Good Stabilizer Codes from Quasi-Cyclic Codes over $\mathbb{F}_4$ and $\mathbb{F}_9$
Spectral Bounds for Quasi-Twisted Codes
A hierarchical Lyapunov-based cascade adaptive control scheme for lower-limb exoskeleton
Polynomially growing harmonic functions on connected groups
CDPM: Convolutional Deformable Part Models for Person Re-identification
DeepSquare: Boosting the Learning Power of Deep Convolutional Neural Networks with Elementwise Square Operators
Unsupervised Question Answering by Cloze Translation
Indoor image representation by high-level semantic features
Incremental Learning from Scratch for Task-Oriented Dialogue Systems
Structure learning of Bayesian networks involving cyclic structures
Energy Efficient Massive MIMO Array Configurations
Convergence of partial sum processes to stable processes with application for aggregation of branching processes
Adversarial Learning of Privacy-Preserving Text Representations for De-Identification of Medical Records
On Universal Codes for Integers: Wallace Tree, Elias Omega and Variations
Approximating the Orthogonality Dimension of Graphs and Hypergraphs
Adaptive Resource Management for a Virtualized Computing Platform within Edge Computing
BiSET: Bi-directional Selective Encoding with Template for Abstractive Summarization
Assuring the Evolvability of Microservices: Insights into Industry Practices and Challenges
Structure-adaptive manifold estimation
Deep Reinforcement Learning for Unmanned Aerial Vehicle-Assisted Vehicular Networks
Two-stage Stochastic Lot-sizing Problem with Chance-constrained Condition in the Second Stage
Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations
Checkpoint/restart approaches for a thread-based MPI runtime
Towards Big data processing in IoT: Path Planning and Resource Management of UAV Base Stations in Mobile-Edge Computing System
Who Will Win It? An In-game Win Probability Model for Football
Fast Task Inference with Variational Intrinsic Successor Features
Decoupling Gating from Linearity
Desingularization of matrix equations employing hypersingular integrals in boundary element methods using double nodes
Activated Random Walks on $\mathbb{Z}^d$
Application-Level Differential Checkpointing for HPC Applications with Dynamic Datasets
Concept Discovery through Information Extraction in Restaurant Domain
On the WalkerMaker-WalkerBreaker games
Torus computed tomography
Selecting stock pairs for pairs trading while incorporating lead-lag relationship
Higher-Order Ranking and Link Prediction: From Closing Triangles to Closing Higher-Order Motifs
Probing Multilingual Sentence Representations With X-Probe
Unified Semantic Parsing with Weak Supervision
Power-law Verification for Event Detection at Multi-spatial Scales from Geo-tagged Tweet Streams
Deep Smoothing of the Implied Volatility Surface
Polynomial-time Updates of Epistemic States in a Fragment of Probabilistic Epistemic Argumentation (Technical Report)
Fault-Tolerant Path-Embedding of Twisted Hypercube-Like Networks THLNs
De Finetti’s control problem with Parisian ruin for spectrally negative Lévy processes
Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification
Reinforcement-Learning-based Adaptive Optimal Control for Arbitrary Reference Tracking
Applying economic measures to lapse risk management with machine learning approaches
Broadcasts on Paths and Cycles
Migrating large codebases to C++ Modules
Optimizing city-scale traffic flows through modeling isolated observations of vehicle movements
Knowledge Gradient for Selection with Covariates: Consistency and Computation
Odd cycles in subgraphs of sparse pseudorandom graphs
On the Universal Near-Shortest Simple Paths Problem
Pose from Shape: Deep Pose Estimation for Arbitrary 3D Objects
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function
Model Predictive Control, Cost Controllability, and Homogeneity
A Survey of Autonomous Driving: Common Practices and Emerging Technologies
Convergence of second-order, entropy stable methods for multi-dimensional conservation laws
LED2Net: Deep Illumination-aware Dehazing with Low-light and Detail Enhancement
Biased random k-SAT
High Accuracy Classification of White Blood Cells using TSLDA Classifier and Covariance Features
Handel: Practical Multi-Signature Aggregation for Large Byzantine Committees
Markov-modulated continuous-time Markov chains to identify site- and branch-specific evolutionary variation
Next-to$^k$ leading log expansions by chord diagrams
A second order analysis of McKean-Vlasov semigroups
Sharp thresholds for nonlinear Hamiltonian cycles in hypergraphs
A decentralized trust-aware collaborative filtering recommender system based on weighted items for social tagging systems
Recognizing Manipulation Actions from State-Transformations
Putting words in context: LSTM language models and lexical ambiguity
Exploring Bayesian approaches to eQTL mapping through probabilistic programming
Collaborative Broadcast in O(log log n) Rounds
Asymptotic approach for backward stochastic differential equation with singular terminal condition *
Evaluation of Dataflow through layers of Deep Neural Networks in Classification and Regression Problems
Learning High-Dimensional Gaussian Graphical Models under Total Positivity without Tuning Parameters
General Video Game Rule Generation
Stereoscopic Omnidirectional Image Quality Assessment Based on Predictive Coding Theory
Attention-based Multi-Input Deep Learning Architecture for Biological Activity Prediction: An Application in EGFR Inhibitors
A Passivity Enforcement Technique for Forced Oscillation Source Location
DCEF: Deep Collaborative Encoder Framework for Unsupervised Clustering
UAV Swarms as Amplify-and-Forward MIMO Relays
Empowering Quality Diversity in Dungeon Design with Interactive Constrained MAP-Elites
Small-Support Uncertainty Principles on $\mathbb{Z}/p$ over Finite Fields

Continue Reading…

Collapse

Read More

Whats new on arXiv

Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains. A particular approach which has received increasing attention is multi-agent reinforcement learning, in which multiple agents learn concurrently to coordinate their actions. In such multi-agent environments, additional learning problems arise due to the continually changing decision-making policies of agents. This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning. The surveyed methods range from modifications in the training procedure, such as centralized training, to learning representations of the opponent’s policy, meta-learning, communication, and decentralized learning. The survey concludes with a list of open problems and possible lines of future research.


Innovating HR Using an Expert System for Recruiting IT Specialists — ESRIT

One of the most rapidly evolving and dynamic business sector is the IT domain, where there is a problem finding experienced, skilled and qualified employees. Specialists are essential for developing and implementing new ideas into products. Human resources (HR) department plays a major role in the recruitment of qualified employees by assessing their skills, using different HR metrics, and selecting the best candidates for a specific job. Most recruiters are not qualified to evaluate IT specialists. In order to decrease the gap between the HR department and IT specialists, we designed, implemented and tested an Expert System for Recruiting IT specialist – ESRIT. The expert system uses text mining, natural language processing, and classification algorithms to extract relevant information from resumes by using a knowledge base that stores the relevant key skills and phrases. The recruiter is looking for the same abilities and certificates, trying to place the best applicant into a specific position. The article presents a developing picture of the top major IT skills that will be required in 2014 and it argues for the choice of the IT abilities domain.


Deep Learning for Spatio-Temporal Data Mining: A Survey

With the fast development of various positioning techniques such as Global Position System (GPS), mobile devices and remote sensing, spatio-temporal data has become increasingly available nowadays. Mining valuable knowledge from spatio-temporal data is critically important to many real world applications including human mobility understanding, smart transportation, urban planning, public safety, health care and environmental management. As the number, volume and resolution of spatio-temporal datasets increase rapidly, traditional data mining methods, especially statistics based methods for dealing with such data are becoming overwhelmed. Recently, with the advances of deep learning techniques, deep leaning models such as convolutional neural network (CNN) and recurrent neural network (RNN) have enjoyed considerable success in various machine learning tasks due to their powerful hierarchical feature learning ability in both spatial and temporal domains, and have been widely applied in various spatio-temporal data mining (STDM) tasks such as predictive learning, representation learning, anomaly detection and classification. In this paper, we provide a comprehensive survey on recent progress in applying deep learning techniques for STDM. We first categorize the types of spatio-temporal data and briefly introduce the popular deep learning models that are used in STDM. Then a framework is introduced to show a general pipeline of the utilization of deep learning models for STDM. Next we classify existing literatures based on the types of ST data, the data mining tasks, and the deep learning models, followed by the applications of deep learning for STDM in different domains including transportation, climate science, human mobility, location based social network, crime analysis, and neuroscience. Finally, we conclude the limitations of current research and point out future research directions.


From Fully Supervised to Zero Shot Settings for Twitter Hashtag Recommendation

We propose a comprehensive end-to-end pipeline for Twitter hashtags recommendation system including data collection, supervised training setting and zero shot training setting. In the supervised training setting, we have proposed and compared the performance of various deep learning architectures, namely Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Transformer Network. However, it is not feasible to collect data for all possible hashtag labels and train a classifier model on them. To overcome this limitation, we propose a Zero Shot Learning (ZSL) paradigm for predicting unseen hashtag labels by learning the relationship between the semantic space of tweets and the embedding space of hashtag labels. We evaluated various state-of-the-art ZSL methods like Convex combination of Semantic Embedding (ConSE), Embarrassingly Simple Zero-Shot Learning (ESZSL) and Deep Embedding Model for Zero-Shot Learning (DEM-ZSL) for the hashtag recommendation task. We demonstrate the effectiveness and scalability of ZSL methods for the recommendation of unseen hashtags. To the best of our knowledge, this is the first quantitative evaluation of ZSL methods to date for unseen hashtags recommendations from tweet text.


Recurrent U-Net for Resource-Constrained Segmentation

State-of-the-art segmentation methods rely on very deep networks that are not always easy to train without very large training datasets and tend to be relatively slow to run on standard GPUs. In this paper, we introduce a novel recurrent U-Net architecture that preserves the compactness of the original U-Net, while substantially increasing its performance to the point where it outperforms the state of the art on several benchmarks. We will demonstrate its effectiveness for several tasks, including hand segmentation, retina vessel segmentation, and road segmentation. We also introduce a large-scale dataset for hand segmentation.


Power Gradient Descent

The development of machine learning is promoting the search for fast and stable minimization algorithms. To this end, we suggest a change in the current gradient descent methods that should speed up the motion in flat regions and slow it down in steep directions of the function to minimize. It is based on a ‘power gradient’, in which each component of the gradient is replaced by its versus-preserving H-th power, with 0<H<1. We test three modern gradient descent methods fed by such variant and by standard gradients, finding the new version to achieve significantly better performances for the Nesterov accelerated gradient and AMSGrad. We also propose an effective new take on the ADAM algorithm, which includes power gradients with varying H.


Table-Based Neural Units: Fully Quantizing Networks for Multiply-Free Inference

In this work, we propose to quantize all parts of standard classification networks and replace the activation-weight–multiply step with a simple table-based lookup. This approach results in networks that are free of floating-point operations and free of multiplications, suitable for direct FPGA and ASIC implementations. It also provides us with two simple measures of per-layer and network-wide compactness as well as insight into the distribution characteristics of activationoutput and weight values. We run controlled studies across different quantization schemes, both fixed and adaptive and, within the set of adaptive approaches, both parametric and model-free. We implement our approach to quantization with minimal, localized changes to the training process, allowing us to benefit from advances in training continuous-valued network architectures. We apply our approach successfully to AlexNet, ResNet, and MobileNet. We show results that are within 1.6% of the reported, non-quantized performance on MobileNet using only 40 entries in our table. This performance gap narrows to zero when we allow tables with 320 entries. Our results give the best accuracies among multiply-free networks.


The König Graph Process

Say that a graph G has property \mathcal{K} if the size of its maximum matching is equal to the order of a minimal vertex cover. We study the following process. Set N:= \binom{n}{2} and let e_1, e_2, \dots e_{N} be a uniformly random ordering of the edges of K_n, with n an even integer. Let G_0 be the empty graph on n vertices. For m \geq 0, G_{m+1} is obtained from G_m by adding the edge e_{m+1} exactly if G_m \cup \{ e_{m+1}\} has property \mathcal{K}. We analyse the behaviour of this process, focusing mainly on two questions: What can be said about the structure of G_N and for which m will G_m contain a perfect matching?


Position-aware Graph Neural Networks

Learning node embeddings that capture a node’s position within the broader graph structure is crucial for many prediction tasks on graphs. However, existing Graph Neural Network (GNN) architectures have limited power in capturing the position/location of a given node with respect to all other nodes of the graph. Here we propose Position-aware Graph Neural Networks (P-GNNs), a new class of GNNs for computing position-aware node embeddings. P-GNN first samples sets of anchor nodes, computes the distance of a given target node to each anchor-set,and then learns a non-linear distance-weighted aggregation scheme over the anchor-sets. This way P-GNNs can capture positions/locations of nodes with respect to the anchor nodes. P-GNNs have several advantages: they are inductive, scalable,and can incorporate node feature information. We apply P-GNNs to multiple prediction tasks including link prediction and community detection. We show that P-GNNs consistently outperform state of the art GNNs, with up to 66% improvement in terms of the ROC AUC score.


Medium-Term Load Forecasting Using Support Vector Regression, Feature Selection, and Symbiotic Organism Search Optimization

An accurate load forecasting has always been one of the main indispensable parts in the operation and planning of power systems. Among different time horizons of forecasting, while short-term load forecasting (STLF) and long-term load forecasting (LTLF) have respectively got benefits of accurate predictors and probabilistic forecasting, medium-term load forecasting (MTLF) demands more attention due to its vital role in power system operation and planning such as optimal scheduling of generation units, robust planning program for customer service, and economic supply. In this study, a hybrid method, composed of Support Vector Regression (SVR) and Symbiotic Organism Search Optimization (SOSO) method, is proposed for MTLF. In the proposed forecasting model, SVR is the main part of the forecasting algorithm while SOSO is embedded into it to optimize the parameters of SVR. In addition, a minimum redundancy-maximum relevance feature selection algorithm is used to in the preprocessing of input data. The proposed method is tested on EUNITE competition dataset to demonstrate its proper performance. Furthermore, it is compared with some previous works to show eligibility of our method.


ADASS: Adaptive Sample Selection for Training Acceleration

Stochastic gradient decent~(SGD) and its variants, including some accelerated variants, have become popular for training in machine learning. However, in all existing SGD and its variants, the sample size in each iteration~(epoch) of training is the same as the size of the full training set. In this paper, we propose a new method, called \underline{ada}ptive \underline{s}ample \underline{s}election~(ADASS), for training acceleration. During different epoches of training, ADASS only need to visit different training subsets which are adaptively selected from the full training set according to the Lipschitz constants of the loss functions on samples. It means that in ADASS the sample size in each epoch of training can be smaller than the size of the full training set, by discarding some samples. ADASS can be seamlessly integrated with existing optimization methods, such as SGD and momentum SGD, for training acceleration. Theoretical results show that the learning accuracy of ADASS is comparable to that of counterparts with full training set. Furthermore, empirical results on both shallow models and deep models also show that ADASS can accelerate the training process of existing methods without sacrificing accuracy.


UnLimited TRAnsfers for Multi-Modal Route Planning: An Efficient Solution

We study a multi-modal route planning scenario consisting of a public transit network and a transfer graph representing a secondary transportation mode (e.g., walking or taxis). The objective is to compute all journeys that are Pareto-optimal with respect to arrival time and the number of required transfers. While various existing algorithms can efficiently compute optimal journeys in either a pure public transit network or a pure transfer graph, combining the two increases running times significantly. As a result, even walking between stops is typically limited by a maximal duration or distance, or by requiring the transfer graph to be transitively closed. To overcome these shortcomings, we propose a novel preprocessing technique called ULTRA (UnLimited TRAnsfers): Given a complete transfer graph (without any limitations, representing an arbitrary non-schedule-based mode of transportation), we compute a small number of transfer shortcuts that are provably sufficient for computing all Pareto-optimal journeys. We demonstrate the practicality of our approach by showing that these transfer shortcuts can be integrated into a variety of state-of-the-art public transit algorithms, establishing the ULTRA-Query algorithm family. Our extensive experimental evaluation shows that ULTRA is able to improve these algorithms from limited to unlimited transfers without sacrificing query speed, yielding the fastest known algorithms for multi-modal routing. This is true not just for walking, but also for other transfer modes such as cycling or driving.


Weighted, Bipartite, or Directed Stream Graphs for the Modeling of Temporal Networks

We recently introduced a formalism for the modeling of temporal networks, that we call stream graphs. It emphasizes the streaming nature of data and allows rigorous definitions of many important concepts generalizing classical graphs. This includes in particular size, density, clique, neighborhood, degree, clustering coefficient, and transitivity. In this contribution, we show that, like graphs, stream graphs may be extended to cope with bipartite structures, with node and link weights, or with link directions. We review the main bipartite, weighted or directed graph concepts proposed in the literature, we generalize them to the cases of bipartite, weighted, or directed stream graphs, and we show that obtained concepts are consistent with graph and stream graph ones. This provides a formal ground for an accurate modeling of the many temporal networks that have one or several of these features.


A Closer Look at the Optimization Landscapes of Generative Adversarial Networks

Generative adversarial networks have been very successful in generative modeling, however they remain relatively hard to optimize compared to standard deep neural networks. In this paper, we try to gain insight into the optimization of GANs by looking at the game vector field resulting from the concatenation of the gradient of both players. Based on this point of view, we propose visualization techniques that allow us to make the following empirical observations. First, the training of GANs suffers from rotational behavior around locally stable stationary points, which, as we show, corresponds to the presence of imaginary components in the eigenvalues of the Jacobian of the game. Secondly, GAN training seems to converge to a stable stationary point which is a saddle point for the generator loss, not a minimum, while still achieving excellent performance. This counter-intuitive yet persistent observation questions whether we actually need a Nash equilibrium to get good performance in GANs.


Reinforcement Learning for Integer Programming: Learning to Cut

Integer programming (IP) is a general optimization framework widely applicable to a variety of unstructured and structured problems arising in, e.g., scheduling, production planning, and graph optimization. As IP models many provably hard to solve problems, modern IP solvers rely on many heuristics. These heuristics are usually human-designed, and naturally prone to suboptimality. The goal of this work is to show that the performance of those solvers can be greatly enhanced using reinforcement learning (RL). In particular, we investigate a specific methodology for solving IPs, known as the Cutting Plane Method. This method is employed as a subroutine by all modern IP solvers. We present a deep RL formulation, network architecture, and algorithms for intelligent adaptive selection of cutting planes (aka cuts). Across a wide range of IP tasks, we show that the trained RL agent significantly outperforms human-designed heuristics, and effectively generalizes to 10X larger instances and across IP problem classes. The trained agent is also demonstrated to benefit the popular downstream application of cutting plane methods in Branch-and-Cut algorithm, which is the backbone of state-of-the-art commercial IP solvers.


On regularization for a convolutional kernel in neural networks

Convolutional neural network is a very important model of deep learning. It can help avoid the exploding/vanishing gradient problem and improve the generalizability of a neural network if the singular values of the Jacobian of a layer are bounded around 1 in the training process. We propose a new penalty function for a convolutional kernel to let the singular values of the corresponding transformation matrix are bounded around 1. We show how to carry out the gradient type methods. The penalty is about the transformation matrix corresponding to a kernel, not directly about the kernel, which is different from results in existing papers. This provides a new regularization method about the weights of convolutional layers. Other penalty functions about a kernel can be devised following this idea in future.


Communication-Efficient Accurate Statistical Estimation

When the data are stored in a distributed manner, direct application of traditional statistical inference procedures is often prohibitive due to communication cost and privacy concerns. This paper develops and investigates two Communication-Efficient Accurate Statistical Estimators (CEASE), implemented through iterative algorithms for distributed optimization. In each iteration, node machines carry out computation in parallel and communicate with the central processor, which then broadcasts aggregated information to node machines for new updates. The algorithms adapt to the similarity among loss functions on node machines, and converge rapidly when each node machine has large enough sample size. Moreover, they do not require good initialization and enjoy linear converge guarantees under general conditions. The contraction rate of optimization errors is presented explicitly, with dependence on the local sample size unveiled. In addition, the improved statistical accuracy per iteration is derived. By regarding the proposed method as a multi-step statistical estimator, we show that statistical efficiency can be achieved in finite steps in typical statistical applications. In addition, we give the conditions under which the one-step CEASE estimator is statistically efficient. Extensive numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the superior performance of our algorithms.


Model Testing for Generalized Scalar-on-Function Linear Models

Scalar-on-function linear models are commonly used to regress functional predictors on a scalar response. However, functional models are more difficult to estimate and interpret than traditional linear models, and may be unnecessarily complex for a data application. Hypothesis testing can be used to guide model selection by determining if a functional predictor is necessary. Using a mixed effects representation with penalized splines and variance component tests, we propose a framework for testing functional linear models with responses from exponential family distributions. The proposed method can accommodate dense and sparse functional data, and be used to test functional predictors for no effect and form of the effect. We show via simulation study that the proposed method achieves the nominal level and has high power, and we demonstrate its utility with two data applications.


Does BLEU Score Work for Code Migration?

Statistical machine translation (SMT) is a fast-growing sub-field of computational linguistics. Until now, the most popular automatic metric to measure the quality of SMT is BiLingual Evaluation Understudy (BLEU) score. Lately, SMT along with the BLEU metric has been applied to a Software Engineering task named code migration. (In)Validating the use of BLEU score could advance the research and development of SMT-based code migration tools. Unfortunately, there is no study to approve or disapprove the use of BLEU score for source code. In this paper, we conducted an empirical study on BLEU score to (in)validate its suitability for the code migration task due to its inability to reflect the semantics of source code. In our work, we use human judgment as the ground truth to measure the semantic correctness of the migrated code. Our empirical study demonstrates that BLEU does not reflect translation quality due to its weak correlation with the semantic correctness of translated code. We provided counter-examples to show that BLEU is ineffective in comparing the translation quality between SMT-based models. Due to BLEU’s ineffectiveness for code migration task, we propose an alternative metric RUBY, which considers lexical, syntactical, and semantic representations of source code. We verified that RUBY achieves a higher correlation coefficient with the semantic correctness of migrated code, 0.775 in comparison with 0.583 of BLEU score. We also confirmed the effectiveness of RUBY in reflecting the changes in translation quality of SMT-based translation models. With its advantages, RUBY can be used to evaluate SMT-based code migration models.


Non-Parametric Calibration for Classification

Many applications for classification methods not only require high accuracy but also reliable estimation of predictive uncertainty. However, while many current classification frameworks, in particular deep neural network architectures, provide very good results in terms of accuracy, they tend to underestimate their predictive uncertainty. In this paper, we propose a method that corrects the confidence output of a general classifier such that it approaches the true probability of classifying correctly. This classifier calibration is, in contrast to existing approaches, based on a non-parametric representation using a latent Gaussian process and specifically designed for multi-class classification. It can be applied to any classification method that outputs confidence estimates and is not limited to neural networks. We also provide a theoretical analysis regarding the over- and underconfidence of a classifier and its relationship to calibration. In experiments we show the universally strong performance of our method across different classifiers and benchmark data sets in contrast to existing classifier calibration techniques.


Relative Hausdorff Distance for Network Analysis

Similarity measures are used extensively in machine learning and data science algorithms. The newly proposed graph Relative Hausdorff (RH) distance is a lightweight yet nuanced similarity measure for quantifying the closeness of two graphs. In this work we study the effectiveness of RH distance as a tool for detecting anomalies in time-evolving graph sequences. We apply RH to cyber data with given red team events, as well to synthetically generated sequences of graphs with planted attacks. In our experiments, the performance of RH distance is at times comparable, and sometimes superior, to graph edit distance in detecting anomalous phenomena. Our results suggest that in appropriate contexts, RH distance has advantages over more computationally intensive similarity measures.


Pay Attention to Convolution Filters: Towards Fast and Accurate Fine-Grained Transfer Learning

We propose an efficient transfer learning method for adapting ImageNet pre-trained Convolutional Neural Network (CNN) to fine-grained image classification task. Conventional transfer learning methods typically face the trade-off between training time and accuracy. By adding ‘attention module’ to each convolutional filters of the pre-trained network, we are able to rank and adjust the importance of each convolutional signal in an end-to-end pipeline. In this report, we show our method can adapt a pre-trianed ResNet50 for a fine-grained transfer learning task within few epochs and achieve accuracy above conventional transfer learning methods and close to models trained from scratch. Our model also offer interpretable result because the rank of the convolutional signal shows which convolution channels are utilized and amplified to achieve better classification result, as well as which signal should be treated as noise for the specific transfer learning task, which could be pruned to lower model size.


Neural Variational Inference For Estimating Uncertainty in Knowledge Graph Embeddings

Recent advances in Neural Variational Inference allowed for a renaissance in latent variable models in a variety of domains involving high-dimensional data. While traditional variational methods derive an analytical approximation for the intractable distribution over the latent variables, here we construct an inference network conditioned on the symbolic representation of entities and relation types in the Knowledge Graph, to provide the variational distributions. The new framework results in a highly-scalable method. Under a Bernoulli sampling framework, we provide an alternative justification for commonly used techniques in large-scale stochastic variational inference, which drastically reduce training time at a cost of an additional approximation to the variational lower bound. We introduce two models from this highly scalable probabilistic framework, namely the Latent Information and Latent Fact models, for reasoning over knowledge graph-based representations. Our Latent Information and Latent Fact models improve upon baseline performance under certain conditions. We use the learnt embedding variance to estimate predictive uncertainty during link prediction, and discuss the quality of these learnt uncertainty estimates. Our source code and datasets are publicly available online at https://…/Neural-Variational-Knowledge-Graphs.


Real-time Attention Based Look-alike Model for Recommender System

Recently, deep learning models play more and more important roles in contents recommender systems. However, although the performance of recommendations is greatly improved, the ‘Matthew effect’ becomes increasingly evident. While the head contents get more and more popular, many competitive long-tail contents are difficult to achieve timely exposure because of lacking behavior features. This issue has badly impacted the quality and diversity of recommendations. To solve this problem, look-alike algorithm is a good choice to extend audience for high quality long-tail contents. But the traditional look-alike models which widely used in online advertising are not suitable for recommender systems because of the strict requirement of both real-time and effectiveness. This paper introduces a real-time attention based look-alike model (RALM) for recommender systems, which tackles the challenge of conflict between real-time and effectiveness. RALM realizes real-time look-alike audience extension benefiting from seeds-to-user similarity prediction and improves the effectiveness through optimizing user representation learning and look-alike learning modeling. For user representation learning, we propose a novel neural network structure named attention merge layer to replace the concatenation layer, which significantly improves the expressive ability of multi-fields feature learning. On the other hand, considering the various members of seeds, we design global attention unit and local attention unit to learn robust and adaptive seeds representation with respect to a certain target user. At last, we introduce seeds clustering mechanism which not only reduces the time complexity of attention units prediction but also minimizes the loss of seeds information at the same time. According to our experiments, RALM shows superior effectiveness and performance than popular look-alike models.


Secure Federated Matrix Factorization

To protect user privacy and meet law regulations, federated (machine) learning is obtaining vast interests in recent years. The key principle of federated learning is training a machine learning model without needing to know each user’s personal raw private data. In this paper, we propose a secure matrix factorization framework under the federated learning setting, called FedMF. First, we design a user-level distributed matrix factorization framework where the model can be learned when each user only uploads the gradient information (instead of the raw preference data) to the server. While gradient information seems secure, we prove that it could still leak users’ raw data. To this end, we enhance the distributed matrix factorization framework with homomorphic encryption. We implement the prototype of FedMF and test it with a real movie rating dataset. Results verify the feasibility of FedMF. We also discuss the challenges for applying FedMF in practice for future research.


Kaskade: Graph Views for Efficient Graph Analytics

Graphs are an increasingly popular way to model real-world entities and relationships between them, ranging from social networks to data lineage graphs and biological datasets. Queries over these large graphs often involve expensive subgraph traversals and complex analytical computations. These real-world graphs are often substantially more structured than a generic vertex-and-edge model would suggest, but this insight has remained mostly unexplored by existing graph engines for graph query optimization purposes. Therefore, in this work, we focus on leveraging structural properties of graphs and queries to automatically derive materialized graph views that can dramatically speed up query evaluation. We present KASKADE, the first graph query optimization framework to exploit materialized graph views for query optimization purposes. KASKADE employs a novel constraint-based view enumeration technique that mines constraints from query workloads and graph schemas, and injects them during view enumeration to significantly reduce the search space of views to be considered. Moreover, it introduces a graph view size estimator to pick the most beneficial views to materialize given a query set and to select the best query evaluation plan given a set of materialized views. We evaluate its performance over real-world graphs, including the provenance graph that we maintain at Microsoft to enable auditing, service analytics, and advanced system optimizations. Our results show that KASKADE substantially reduces the effective graph size and yields significant performance speedups (up to 50X), in some cases making otherwise intractable queries possible.

Continue Reading…

Collapse

Read More

Document worth reading: “Artificial Intelligence and its Role in Near Future”

AI technology has a long history which is actively and constantly changing and growing. It focuses on intelligent agents, which contain devices that perceive the environment and based on which takes actions in order to maximize goal success chances. In this paper, we will explain the modern AI basics and various representative applications of AI. In the context of the modern digitalized world, AI is the property of machines, computer programs, and systems to perform the intellectual and creative functions of a person, independently find ways to solve problems, be able to draw conclusions and make decisions. Most artificial intelligence systems have the ability to learn, which allows people to improve their performance over time. The recent research on AI tools, including machine learning, deep learning and predictive analysis intended toward increasing the planning, learning, reasoning, thinking and action taking ability. Based on which, the proposed research intends towards exploring on how the human intelligence differs from the artificial intelligence. Moreover, we critically analyze what AI of today is capable of doing, why it still cannot reach human intelligence and what are the open challenges existing in front of AI to reach and outperform human level of intelligence. Furthermore, it will explore the future predictions for artificial intelligence and based on which potential solution will be recommended to solve it within next decades. Artificial Intelligence and its Role in Near Future

Continue Reading…

Collapse

Read More

Magister Dixit

“You should embrace the Bayesian approach.” Kamil Bartocha ( 26. Apr 2015 )

Continue Reading…

Collapse

Read More

Magister Dixit

“The uninspected (inevitably) deteriorates.” Dwight David Eisenhower

Continue Reading…

Collapse

Read More

NAS-Bench-101: Towards Reproducible Neural Architecture Search - implementation -

** Nuit Blanche is now on Twitter: @NuitBlog **






Recent advances in neural architecture search (NAS) demand tremendous computational resources, which makes it difficult to reproduce experiments and imposes a barrier-to-entry to researchers without access to large-scale computation. We aim to ameliorate these problems by introducing NAS-Bench-101, the first public architecture dataset for NAS research. To build NAS-Bench-101, we carefully constructed a compact, yet expressive, search space, exploiting graph isomorphisms to identify 423k unique convolutional architectures. We trained and evaluated all of these architectures multiple times on CIFAR-10 and compiled the results into a large dataset of over 5 million trained models. This allows researchers to evaluate the quality of a diverse range of models in milliseconds by querying the pre-computed dataset. We demonstrate its utility by analyzing the dataset as a whole and by benchmarking a range of architecture optimization algorithms.

Data and code for NAS-Bench-101 is here: https://github.com/google-research/nasbench


Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn


Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog

Continue Reading…

Collapse

Read More

Document worth reading: “Characterizing HCI Research in China: Streams, Methodologies and Future Directions”

Human-computer Interaction (HCI) is an interdisciplinary research field involving multiple disciplines, such as computer science, psychology, social science and design. It studies the interaction between users and computer in order to better design technologies and solve real-life problems. This position paper characterizes HCI research in China by comparing it with international HCI research traditions. We discuss the current streams and methodologies of Chinese HCI research. We then propose future HCI research directions such as including emergent users who have less access to technology and addressing the cultural dimensions in order to provide better technical solutions and support. Characterizing HCI Research in China: Streams, Methodologies and Future Directions

Continue Reading…

Collapse

Read More

R Packages worth a look

Easy Computation of Marketing Metrics with Different Analysis Axis (mmetrics)
Provides a mechanism for easy computation of marketing metrics. By default in this package, metrics for digital marketing (e.g. CTR (Click Through Rate …

D3 Dynamic Cluster Visualizations (klustR)
Used to create dynamic, interactive ‘D3.js’ based parallel coordinates and principal component plots in ‘R’. The plots make visualizing k-means or othe …

Store and Retrieve Data.frames in a Git Repository (git2rdata)
Make versioning of data.frame easy and efficient using git repositories.

Uplift Model Evaluation with Plots and Metrics (uplifteval)
Provides a variety of plots and metrics to evaluate uplift models including the ‘R uplift’ package’s Qini metric and Qini plot, a port of the ‘python p …

Continue Reading…

Collapse

Read More

Whats new on arXiv – Complete List

Sharing of vulnerability information among companies — a survey of Swedish companies
Evolution of ROOT package management
Microservices Migration in Industry: Intentions, Strategies, and Challenges
Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification
A Document-grounded Matching Network for Response Selection in Retrieval-based Chatbots
Incremental Classifier Learning Based on PEDCC-Loss and Cosine Distance
Coupled Variational Recurrent Collaborative Filtering
Online Learning and Planning in Partially Observable Domains without Prior Knowledge
Improving Reproducible Deep Learning Workflows with DeepDIVA
Self-Supervised Learning for Contextualized Extractive Summarization
Modeling the Past and Future Contexts for Session-based Recommendation
Unsupervised Minimax: Adversarial Curiosity, Generative Adversarial Networks, and Predictability Minimization
The snippets taxonomy in web search engines
Modeling Sentiment Dependencies with Graph Convolutional Networks for Aspect-level Sentiment Classification
Simultaneously Learning Architectures and Features of Deep Neural Networks
Learning robust visual representations using data augmentation invariance
Anomaly Detection in High Performance Computers: A Vicinity Perspective
On Stabilizing Generative Adversarial Training with Noise
Faster Algorithms for High-Dimensional Robust Covariance Estimation
Learning Selection Masks for Deep Neural Networks
Fast and Accurate Least-Mean-Squares Solvers
Graph Convolutional Transformer: Learning the Graphical Structure of Electronic Health Records
Large Scale Structure of Neural Network Loss Landscapes
What Kind of Language Is Hard to Language-Model?
eSLAM: An Energy-Efficient Accelerator for Real-Time ORB-SLAM on FPGA Platform
Characterising hyperbolic hyperplanes of a non-singular quadric in $PG(4,q)$
Customizing Pareto Simulated Annealing for Multi-objective Optimization of Control Cabinet Layout
Advertising in an oligopoly with differentiated goods under general demand and cost functions: A differential game approach
A Combination of Temporal Sequence Learning and Data Description for Anomaly-based NIDS
274-GHz CMOS Signal Generator with an On-Chip Patch Antenna in a QFN Package
The Performance Of Convolutional Coding Based Cooperative Communication: Relay Position And Power Allocation Analysis
Preferred Design of Hierarchical Distribution Matching
A Unified Definition and Computation of Laplacian Spectral Distances
Convergence analysis of a Crank-Nicolson Galerkin method for an inverse source problem for parabolic systems with boundary observations
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Low Rank Approximation Directed by Leverage Scores and Computed at Sub-linear Cost
Inferring 3D Shapes from Image Collections using Adversarial Networks
DeepcomplexMRI: Exploiting deep residual network for fast parallel MR imaging with complex convolution
Hybrid Function Sparse Representation towards Image Super Resolution
Representation Learning-Assisted Click-Through Rate Prediction
Evaluation of Seed Set Selection Approaches and Active Learning Strategies in Predictive Coding
Beam Learning — Using Machine Learning for Finding Beam Directions
Maximum Mean Discrepancy Gradient Flow
Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning
Recognizing License Plates in Real-Time
PAN: Projective Adversarial Network for Medical Image Segmentation
Band Attention Convolutional Networks For Hyperspectral Image Classification
Window Based BFT Blockchain Consensus
DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain
Indecomposable $0$-Hecke modules for extended Schur functions
Different Approaches for Human Activity Recognition: A Survey
Approximate Gradient Descent Convergence Dynamics for Adaptive Control on Heterogeneous Networks
Schwinger-Dyson and loop equations for a product of square Ginibre random matrices
A Method to construct all the Paving Matroids over a Finite Set
Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks
Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks
Numerical computations of split Bregman method for fourth order total variation flow
Detection and estimation of parameters in high dimensional multiple change point regression models via $\ell_1/\ell_0$ regularization and discrete optimization
Probabilistic Forecasting with Temporal Convolutional Neural Network
An approximate Bayesian approach to regression estimation with many auxiliary variables
Symmetric multisets of permutations
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Classification of EEG Signals using Genetic Programming for Feature Construction
iProStruct2D: Identifying protein structural classes by deep learning via 2D representations
Few-Shot Point Cloud Region Annotation with Human in the Loop
Quantum Random Numbers generated by the Cloud Superconducting Quantum Computer
Evolutionary Trigger Set Generation for DNN Black-Box Watermarking
Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems
Vulnerabilities of Power System Operations to Load Forecasting Data Injection Attacks
Deep learning analysis of cardiac CT angiography for detection of coronary arteries with functionally significant stenosis
NAS-FCOS: Fast Neural Architecture Search for Object Detection
Macro-action Multi-timescale Dynamic Programming for Energy Management with Phase Change Materials
Behavioral Switching Loss Modeling of Inverter Modules
Upper envelopes of families of Feller semigroups and viscosity solutions to a class of nonlinear Cauchy problems
On the Vector Space in Photoplethysmography Imaging
On the Universality of Noiseless Linear Estimation with Respect to the Measurement Matrix
Metrics for Learning in Topological Persistence
New loop expansion for the Random Magnetic Field Ising Ferromagnets at zero temperature
Survey of Artificial Intelligence for Card Games and Its Application to the Swiss Game Jass
Orthogonal Cocktail BPSK: Exceeding Shannon Capacity of QPSK Input
A Novel Cost Function for Despeckling using Convolutional Neural Networks
Single Image Blind Deblurring Using Multi-Scale Latent Structure Prior
Elephant random walks with delays
Bag of Color Features For Color Constancy
Reinforcement Learning of Minimalist Numeral Grammars
Reinforcement Learning for Channel Coding: Learned Bit-Flipping Decoding
Almost Optimal Semi-streaming Maximization for k-Extendible Systems
Quantifying Intrinsic Uncertainty in Classification via Deep Dirichlet Mixture Networks
Compressed Sensing MRI via a Multi-scale Dilated Residual Convolution Network
Continual Reinforcement Learning deployed in Real-life using Policy Distillation and Sim2Real Transfer
Classification of Radio Signals and HF Transmission Modes with Deep Learning
TW-SMNet: Deep Multitask Learning of Tele-Wide Stereo Matching
Cross-Modal Relationship Inference for Grounding Referring Expressions
Rearrangement operations on unrooted phylogenetic networks
Rate-Splitting Unifying SDMA, OMA, NOMA, and Multicasting in MISO Broadcast Channel: A Simple Two-User Rate Analysis
Causal Discovery with Reinforcement Learning
Efficient structure learning with automatic sparsity selection for causal graph processes
EXmatcher: Combining Features Based on Reference Strings and Segments to Enhance Citation Matching
Optimizing Pipelined Computation and Communication for Latency-Constrained Edge Learning
Two-dimensional partial cubes
Competing (Semi)-Selfish Miners in Bitcoin
Approximate Variational Inference Based on a Finite Sample of Gaussian Latent Variables
BasisConv: A method for compressed representation and learning in CNNs
Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance
Off-equilibrium computation of the dynamic critical exponent of the three-dimensional Heisenberg model
A Linear Algorithm for Minimum Dominator Colorings of Orientations of Paths
LocLets: Localized Graph Wavelets for Processing Frequency Sparse Signals on Graphs
Three product formulas for ratios of tiling counts of hexagons with collinear holes
Identities involving Schur functions and their applications to a shuffling theorem
WikiDataSets : Standardized sub-graphs from WikiData
Translation hyperovals and $\mathbb{F}_2$-linear sets of pseudoregulus type
Statistical Species Identification
A Graph-theoretic Method to Define any Boolean Operation on Partitions
A refined primal-dual analysis of the implicit bias
Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise
Principled Training of Neural Networks with Direct Feedback Alignment
Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology
Challenges in Time-Stamp Aware Anomaly Detection in Traffic Videos
k-Nearest Neighbor Optimization via Randomized Hyperstructure Convex Hull
Joint Subspace Recovery and Enhanced Locality Driven Robust Flexible Discriminative Dictionary Learning
A proof of the mean-field limit for $λ$-convex potentials by $Γ$-Convergence
Mimic and Fool: A Task Agnostic Adversarial Attack
Monte Carlo and Quasi-Monte Carlo Density Estimation via Conditioning
Dynamical Anatomy of NARMA10 Benchmark Task
Dual-Band Fading Multiple Access Relay Channels
Adaptive Neural Signal Detection for Massive MIMO
New dynamic and verifiable multi-secret sharing schemes based on LFSR public key cryptosystem
Regional economic convergence and spatial quantile regression
Retrieve, Read, Rerank: Towards End-to-End Multi-Document Reading Comprehension
Achieving competitive advantage in academia through early career coauthorship with top scientists
An explicit characterization of arc-transitive circulants
ROOT I/O compression algorithms and their performance impact within Run 3
Bias-Aware Inference in Fuzzy Regression Discontinuity Designs
Scale Invariant Fully Convolutional Network: Detecting Hands Efficiently
Hierarchical multiscale finite element method for multi-continuum media
Distance Matrix of a Class of Completely Positive Graphs: Determinant and Inverse
Creation of User Friendly Datasets: Insights from a Case Study concerning Explanations of Loan Denials
Learning Symmetries of Classical Integrable Systems
A Proximal Point Dual Newton Algorithm for Solving Group Graphical Lasso Problems
Analysis of Optimization Algorithms via Sum-of-Squares
`Project & Excite’ Modules for Segmentation of Volumetric Medical Scans
Gated CRF Loss for Weakly Supervised Semantic Image Segmentation
Journal Name Extraction from Japanese Scientific News Articles
Deep learning control of artificial avatars in group coordination tasks
Mesh adaptivity for quasi-static phase-field fractures based on a residual-type a posteriori error estimator
Residual estimates for post-processors in elliptic problems
Stable Rank Normalization for Improved Generalization in Neural Networks and GANs
Two-step Constructive Approaches for Dungeon Generation
Control contribution identifies top driver nodes in complex networks
Extracting Interpretable Concept-Based Decision Trees from CNNs
Characterization and valuation of uncertainty of calibrated parameters in stochastic decision models
Matricial characterization of tournaments with maximum number of diamonds
Areas of triangles and SL_2 actions in finite rings
A Taxonomy of Channel Pruning Signals in CNNs
StRE: Self Attentive Edit Quality Prediction in Wikipedia
Data-Driven Model Predictive Control with Stability and Robustness Guarantees
Efficient Kernel-based Subsequence Search for User Identification from Walking Activity
A Hybrid Approach Between Adversarial Generative Networks and Actor-Critic Policy Gradient for Low Rate High-Resolution Image Compression
Stability and Metastability of Traffic Dynamics in Uplink Random Access Networks
Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network
Generating Summaries with Topic Templates and Structured Convolutional Decoders
An Improved Analysis of Training Over-parameterized Deep Neural Networks
Hybrid Nonlinear Observers for Inertial Navigation Using Landmark Measurements
On Single Source Robustness in Deep Fusion Models
Rethinking Person Re-Identification with Confidence
Variance-reduced $Q$-learning is minimax optimal
Causal Inference in Higher Education: Building Better Curriculums
HEAD-QA: A Healthcare Dataset for Complex Reasoning
Membership-based Manoeuvre Negotiation in Autonomous and Safety-critical Vehicular Systems
Generative adversarial network for segmentation of motion affected neonatal brain MRI
Using Structured Representation and Data: A Hybrid Model for Negation and Sentiment in Customer Service Conversations
Communication and Memory Efficient Testing of Discrete Distributions
ProPublica’s COMPAS Data Revisited
Automatic brain tissue segmentation in fetal MRI using convolutional neural networks
3-D Surface Segmentation Meets Conditional Random Fields
Asymptotic analysis of exit time for dynamical systems with a single well potential
The $h^*$-polynomials of locally anti-blocking lattice polytopes and their $γ$-positivity
Data-Free Quantization through Weight Equalization and Bias Correction
Clouds of Oriented Gradients for 3D Detection of Objects, Surfaces, and Indoor Scene Layouts
Shapes and Context: In-the-Wild Image Synthesis & Manipulation

Continue Reading…

Collapse

Read More

Whats new on arXiv

Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification

CNNs, RNNs, GCNs, and CapsNets have shown significant insights in representation learning and are widely used in various text mining tasks such as large-scale multi-label text classification. However, most existing deep models for multi-label text classification consider either the non-consecutive and long-distance semantics or the sequential semantics, but how to consider them both coherently is less studied. In addition, most existing methods treat output labels as independent methods, but ignore the hierarchical relations among them, leading to useful semantic information loss. In this paper, we propose a novel hierarchical taxonomy-aware and attentional graph capsule recurrent CNNs framework for large-scale multi-label text classification. Specifically, we first propose to model each document as a word order preserved graph-of-words and normalize it as a corresponding words-matrix representation which preserves both the non-consecutive, long-distance and local sequential semantics. Then the words-matrix is input to the proposed attentional graph capsule recurrent CNNs for more effectively learning the semantic features. To leverage the hierarchical relations among the class labels, we propose a hierarchical taxonomy embedding method to learn their representations, and define a novel weighted margin loss by incorporating the label representation similarity. Extensive evaluations on three datasets show that our model significantly improves the performance of large-scale multi-label text classification by comparing with state-of-the-art approaches.


A Document-grounded Matching Network for Response Selection in Retrieval-based Chatbots

We present a document-grounded matching network (DGMN) for response selection that can power a knowledge-aware retrieval-based chatbot system. The challenges of building such a model lie in how to ground conversation contexts with background documents and how to recognize important information in the documents for matching. To overcome the challenges, DGMN fuses information in a document and a context into representations of each other, and dynamically determines if grounding is necessary and importance of different parts of the document and the context through hierarchical interaction with a response at the matching step. Empirical studies on two public data sets indicate that DGMN can significantly improve upon state-of-the-art methods and at the same time enjoys good interpretability.


Incremental Classifier Learning Based on PEDCC-Loss and Cosine Distance

The main purpose of incremental learning is to learn new knowledge while not forgetting the knowledge which have been learned before. At present, the main challenge in this area is the catastrophe forgetting, namely the network will lose their performance in the old tasks after training for new tasks. In this paper, we introduce an ensemble method of incremental classifier to alleviate this problem, which is based on the cosine distance between the output feature and the pre-defined center, and can let each task to be preserved in different networks. During training, we make use of PEDCC-Loss to train the CNN network. In the stage of testing, the prediction is determined by the cosine distance between the network latent features and pre-defined center. The experimental results on EMINST and CIFAR100 show that our method outperforms the recent LwF method, which use the knowledge distillation, and iCaRL method, which keep some old samples while training for new task. The method can achieve the goal of not forgetting old knowledge while training new classes, and solve the problem of catastrophic forgetting better.


Coupled Variational Recurrent Collaborative Filtering

We focus on the problem of streaming recommender system and explore novel collaborative filtering algorithms to handle the data dynamicity and complexity in a streaming manner. Although deep neural networks have demonstrated the effectiveness of recommendation tasks, it is lack of explorations on integrating probabilistic models and deep architectures under streaming recommendation settings. Conjoining the complementary advantages of probabilistic models and deep neural networks could enhance both model effectiveness and the understanding of inference uncertainties. To bridge the gap, in this paper, we propose a Coupled Variational Recurrent Collaborative Filtering (CVRCF) framework based on the idea of Deep Bayesian Learning to handle the streaming recommendation problem. The framework jointly combines stochastic processes and deep factorization models under a Bayesian paradigm to model the generation and evolution of users’ preferences and items’ popularities. To ensure efficient optimization and streaming update, we further propose a sequential variational inference algorithm based on a cross variational recurrent neural network structure. Experimental results on three benchmark datasets demonstrate that the proposed framework performs favorably against the state-of-the-art methods in terms of both temporal dependency modeling and predictive accuracy. The learned latent variables also provide visualized interpretations for the evolution of temporal dynamics.


Online Learning and Planning in Partially Observable Domains without Prior Knowledge

How an agent can act optimally in stochastic, partially observable domains is a challenge problem, the standard approach to address this issue is to learn the domain model firstly and then based on the learned model to find the (near) optimal policy. However, offline learning the model often needs to store the entire training data and cannot utilize the data generated in the planning phase. Furthermore, current research usually assumes the learned model is accurate or presupposes knowledge of the nature of the unobservable part of the world. In this paper, for systems with discrete settings, with the benefits of Predictive State Representations~(PSRs), a model-based planning approach is proposed where the learning and planning phases can both be executed online and no prior knowledge of the underlying system is required. Experimental results show compared to the state-of-the-art approaches, our algorithm achieved a high level of performance with no prior knowledge provided, along with theoretical advantages of PSRs. Source code is available at https://…/PSR-MCTS-Online.


Improving Reproducible Deep Learning Workflows with DeepDIVA

The field of deep learning is experiencing a trend towards producing reproducible research. Nevertheless, it is still often a frustrating experience to reproduce scientific results. This is especially true in the machine learning community, where it is considered acceptable to have black boxes in your experiments. We present DeepDIVA, a framework designed to facilitate easy experimentation and their reproduction. This framework allows researchers to share their experiments with others, while providing functionality that allows for easy experimentation, such as: boilerplate code, experiment management, hyper-parameter optimization, verification of data integrity and visualization of data and results. Additionally, the code of DeepDIVA is well-documented and supported by several tutorials that allow a new user to quickly familiarize themselves with the framework.


Self-Supervised Learning for Contextualized Extractive Summarization

Existing models for extractive summarization are usually trained from scratch with a cross-entropy loss, which does not explicitly capture the global context at the document level. In this paper, we aim to improve this task by introducing three auxiliary pre-training tasks that learn to capture the document-level context in a self-supervised fashion. Experiments on the widely-used CNN/DM dataset validate the effectiveness of the proposed auxiliary tasks. Furthermore, we show that after pre-training, a clean model with simple building blocks is able to outperform previous state-of-the-art that are carefully designed.


Modeling the Past and Future Contexts for Session-based Recommendation

Long session-based recommender systems have attacted much attention recently. For each user, they may create hundreds of click behaviors in short time. To learn long session item dependencies, previous sequential recommendation models resort either to data augmentation or a left-to-right autoregressive training approach. While effective, an obvious drawback is that future user behaviors are always mising during training. In this paper, we claim that users’ future action signals can be exploited to boost the recommendation quality. To model both past and future contexts, we investigate three ways of augmentation techniques from both data and model perspectives. Moreover, we carefully design two general neural network architectures: a pretrained two-way neural network model and a deep contextualized model trained on a text gap-filling task. Experiments on four real-word datasets show that our proposed two-way neural network models can achieve competitive or even much better results. Empirical evidence confirms that modeling both past and future context is a promising way to offer better recommendation accuracy.


Unsupervised Minimax: Adversarial Curiosity, Generative Adversarial Networks, and Predictability Minimization

Generative Adversarial Networks (GANs) learn to model data distributions through two unsupervised neural networks, each minimizing the objective function maximized by the other. We relate this game theoretic strategy to earlier neural networks playing unsupervised minimax games. (i) GANs can be formulated as a special case of Adversarial Curiosity (1990) based on a minimax duel between two networks, one generating data through its probabilistic actions, the other predicting consequences thereof. (ii) We correct a previously published claim that Predictability Minimization (PM, 1990s) is not based on a minimax game. PM models data distributions through a neural encoder that maximizes the objective function minimized by a neural predictor of the code components.


The snippets taxonomy in web search engines

In this paper authors analyzed 50 000 keywords results collected from localized Polish Google search engine. We proposed a taxonomy for snippets displayed in search results as regular, rich, news, featured and entity types snippets. We observed some correlations between overlapping snippets in the same keywords. Results show that commercial keywords do not cause results having rich or entity types snippets, whereas keywords resulting with snippets are not commercial nature. We found that significant number of snippets are scholarly articles and rich cards carousel. We conclude our findings with conclusion and research limitations.


Modeling Sentiment Dependencies with Graph Convolutional Networks for Aspect-level Sentiment Classification

Aspect-level sentiment classification aims to distinguish the sentiment polarities over one or more aspect terms in a sentence. Existing approaches mostly model different aspects in one sentence independently, which ignore the sentiment dependencies between different aspects. However, we find such dependency information between different aspects can bring additional valuable information. In this paper, we propose a novel aspect-level sentiment classification model based on graph convolutional networks (GCN) which can effectively capture the sentiment dependencies between multi-aspects in one sentence. Our model firstly introduces bidirectional attention mechanism with position encoding to model aspect-specific representations between each aspect and its context words, then employs GCN over the attention mechanism to capture the sentiment dependencies between different aspects in one sentence. We evaluate the proposed approach on the SemEval 2014 datasets. Experiments show that our model outperforms the state-of-the-art methods. We also conduct experiments to evaluate the effectiveness of GCN module, which indicates that the dependencies between different aspects is highly helpful in aspect-level sentiment classification.


Simultaneously Learning Architectures and Features of Deep Neural Networks

This paper presents a novel method which simultaneously learns the number of filters and network features repeatedly over multiple epochs. We propose a novel pruning loss to explicitly enforces the optimizer to focus on promising candidate filters while suppressing contributions of less relevant ones. In the meanwhile, we further propose to enforce the diversities between filters and this diversity-based regularization term improves the trade-off between model sizes and accuracies. It turns out the interplay between architecture and feature optimizations improves the final compressed models, and the proposed method is compared favorably to existing methods, in terms of both models sizes and accuracies for a wide range of applications including image classification, image compression and audio classification.


Learning robust visual representations using data augmentation invariance

Deep convolutional neural networks trained for image object categorization have shown remarkable similarities with representations found across the primate ventral visual stream. Yet, artificial and biological networks still exhibit important differences. Here we investigate one such property: increasing invariance to identity-preserving image transformations found along the ventral stream. Despite theoretical evidence that invariance should emerge naturally from the optimization process, we present empirical evidence that the activations of convolutional neural networks trained for object categorization are not robust to identity-preserving image transformations commonly used in data augmentation. As a solution, we propose data augmentation invariance, an unsupervised learning objective which improves the robustness of the learned representations by promoting the similarity between the activations of augmented image samples. Our results show that this approach is a simple, yet effective and efficient (10 % increase in training time) way of increasing the invariance of the models while obtaining similar categorization performance.


Anomaly Detection in High Performance Computers: A Vicinity Perspective

In response to the demand for higher computational power, the number of computing nodes in high performance computers (HPC) increases rapidly. Exascale HPC systems are expected to arrive by 2020. With drastic increase in the number of HPC system components, it is expected to observe a sudden increase in the number of failures which, consequently, poses a threat to the continuous operation of the HPC systems. Detecting failures as early as possible and, ideally, predicting them, is a necessary step to avoid interruptions in HPC systems operation. Anomaly detection is a well-known general purpose approach for failure detection, in computing systems. The majority of existing methods are designed for specific architectures, require adjustments on the computing systems hardware and software, need excessive information, or pose a threat to users’ and systems’ privacy. This work proposes a node failure detection mechanism based on a vicinity-based statistical anomaly detection approach using passively collected and anonymized system log entries. Application of the proposed approach on system logs collected over 8 months indicates an anomaly detection precision between 62% to 81%.


On Stabilizing Generative Adversarial Training with Noise

We present a novel method and analysis to train generative adversarial networks (GAN) in a stable manner. As shown in recent analysis, training is often undermined by the probability distribution of the data being zero on neighborhoods of the data space. We notice that the distributions of real and generated data should match even when they undergo the same filtering. Therefore, to address the limited support problem we propose to train GANs by using different filtered versions of the real and generated data distributions. In this way, filtering does not prevent the exact matching of the data distribution, while helping training by extending the support of both distributions. As filtering we consider adding samples from an arbitrary distribution to the data, which corresponds to a convolution of the data distribution with the arbitrary one. We also propose to learn the generation of these samples so as to challenge the discriminator in the adversarial training. We show that our approach results in a stable and well-behaved training of even the original minimax GAN formulation. Moreover, our technique can be incorporated in most modern GAN formulations and leads to a consistent improvement on several common datasets.


Faster Algorithms for High-Dimensional Robust Covariance Estimation

We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions. Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given N = \tilde{\Omega}(d^2/\epsilon^2) samples from a d-dimensional Gaussian distribution, an \epsilon-fraction of which may be arbitrarily corrupted, our algorithm runs in time \tilde{O}(d^{3.26})/\mathrm{poly}(\epsilon) and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes \tilde{\Omega}(d^{2 \omega}) when \epsilon = \Omega(1), where \omega is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques.


Learning Selection Masks for Deep Neural Networks

Data have often to be moved between servers and clients during the inference phase. For instance, modern virtual assistants collect data on mobile devices and the data are sent to remote servers for the analysis. A related scenario is that clients have to access and download large amounts of data stored on servers in order to apply machine learning models. Depending on the available bandwidth, this data transfer can be a serious bottleneck, which can significantly limit the application machine learning models. In this work, we propose a simple yet effective framework that allows to select certain parts of the input data needed for the subsequent application of a given neural network. Both the masks as well as the neural network are trained simultaneously such that a good model performance is achieved while, at the same time, only a minimal amount of data is selected by the masks. During the inference phase, only the parts selected by the masks have to be transferred between the server and the client. Our experimental evaluation indicates that it is, for certain learning tasks, possible to significantly reduce the amount of data needed to be transferred without affecting the model performance much.


Fast and Accurate Least-Mean-Squares Solvers

Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations. We suggest an algorithm that gets a finite set of n d-dimensional real vectors and returns a weighted subset of d+1 vectors whose sum is \emph{exactly} the same. The proof in Caratheodory’s Theorem (1907) computes such a subset in O(n^2d^2) time and thus not used in practice. Our algorithm computes this subset in O(nd) time, using O(\log n) calls to Caratheodory’s construction on small but ‘smart’ subsets. This is based on a novel paradigm of fusion between different data summarization techniques, known as sketches and coresets. As an example application, we show how it can be used to boost the performance of existing LMS solvers, such as those in scikit-learn library, up to x100. Generalization for streaming and distributed (big) data is trivial. Extensive experimental results and complete open source code are also provided.


Graph Convolutional Transformer: Learning the Graphical Structure of Electronic Health Records

Effective modeling of electronic health records (EHR) is rapidly becoming an important topic in both academia and industry. A recent study showed that utilizing the graphical structure underlying EHR data (e.g. relationship between diagnoses and treatments) improves the performance of prediction tasks such as heart failure diagnosis prediction. However, EHR data do not always contain complete structure information. Moreover, when it comes to claims data, structure information is completely unavailable to begin with. Under such circumstances, can we still do better than just treating EHR data as a flat-structured bag-of-features? In this paper, we study the possibility of utilizing the implicit structure of EHR by using the Transformer for prediction tasks on EHR data. Specifically, we argue that the Transformer is a suitable model to learn the hidden EHR structure, and propose the Graph Convolutional Transformer, which uses data statistics to guide the structure learning process. Our model empirically demonstrated superior prediction performance to previous approaches on both synthetic data and publicly available EHR data on encounter-based prediction tasks such as graph reconstruction and readmission prediction, indicating that it can serve as an effective general-purpose representation learning algorithm for EHR data.


Large Scale Structure of Neural Network Loss Landscapes

There are many surprising and perhaps counter-intuitive properties of optimization of deep neural networks. We propose and experimentally verify a unified phenomenological model of the loss landscape that incorporates many of them. High dimensionality plays a key role in our model. Our core idea is to model the loss landscape as a set of high dimensional \emph{wedges} that together form a large-scale, inter-connected structure and towards which optimization is drawn. We first show that hyperparameter choices such as learning rate, network width and L_2 regularization, affect the path optimizer takes through the landscape in a similar ways, influencing the large scale curvature of the regions the optimizer explores. Finally, we predict and demonstrate new counter-intuitive properties of the loss-landscape. We show an existence of low loss subspaces connecting a set (not only a pair) of solutions, and verify it experimentally. Finally, we analyze recently popular ensembling techniques for deep networks in the light of our model.


What Kind of Language Is Hard to Language-Model?

How language-agnostic are current state-of-the-art NLP tools? Are there some types of language that are easier to model with current methods? In prior work (Cotterell et al., 2018) we attempted to address this question for language modeling, and observed that recurrent neural network language models do not perform equally well over all the high-resource European languages found in the Europarl corpus. We speculated that inflectional morphology may be the primary culprit for the discrepancy. In this paper, we extend these earlier experiments to cover 69 languages from 13 language families using a multilingual Bible corpus. Methodologically, we introduce a new paired-sample multiplicative mixed-effects model to obtain language difficulty coefficients from at-least-pairwise parallel corpora. In other words, the model is aware of inter-sentence variation and can handle missing data. Exploiting this model, we show that ‘translationese’ is not any easier to model than natively written language in a fair comparison. Trying to answer the question of what features difficult languages have in common, we try and fail to reproduce our earlier (Cotterell et al., 2018) observation about morphological complexity and instead reveal far simpler statistics of the data that seem to drive complexity in a much larger sample.

Continue Reading…

Collapse

Read More

A Model and Simulation of Emotion Dynamics

(This article was first published on R on Will Hipson, and kindly contributed to R-bloggers)

Emotion dynamics is the study of how emotions change over time. Sometimes our feelings are quite stable, but other times capricious. Measuring and predicting these patterns for different people is somewhat of a Holy Grail for emotion researchers. In particular, some researchers are aspiring to discover mathematical laws that capture the complexity of our inner emotional experiences – much like physicists divining the laws that govern objects in the natural environment. These discoveries would revolutionize our understanding of our everyday feelings and when our emotions can go awry.

This series of blog posts, which I kicked off earlier this month with a simulation of emotions during basketball games, is inspired by researchers like Peter Kuppens and Tom Hollenstein (to name a few) who have collected and analyzed reams of intensive self-reports on people’s feelings from one moment to the next. My approach is to reverse engineer these insights and generate models that simulate emotions evolving over time – like this:

Affective State Space

We start with the affective state space – the theoretical landscape on which our conscious feelings roam free. This space is represented as two-dimensional, although we acknowledge that this fails to capture all aspects of conscious feeling. The first dimension, represented along the x-axis, is valence and this refers to how unpleasant vs. pleasant we feel. The second dimension, represented along the y-axis, is arousal. Somewhat less intuitive, arousal refers to how deactivated/sluggish/sleepy vs. activated/energized/awake we feel. At any time, our emotional state can be defined in terms of valence and arousal. So if you’re feeling stressed you would be low in valence and high in arousal. Let’s say you’re serene and calm, then you would be high in valence and low in arousal. Most of the time, we feel moderately high valence and moderate arousal (i.e., content), but if you’re the type of person who is chronically stressed, this would be different.

This is all well and good when we think about how we’re feeling right now, but it’s also worth considering how our emotions are changing. On a regular day, our emotions undergo minor fluctuations – sometimes in response to minor hassles or victories, and sometimes for no discernible reason. In this small paragraph, I’ve laid out a number of parameters, all of which vary between different people:

  • Attractor: Our typical emotional state. At any given moment, our feelings are pulled toward this state. Some people tend to be happier, whereas others are less happy.
  • Stability: How emotionally stable one is. Some people are more emotionally stable than others. Even in the face of adversity, an emotionally stable person keeps their cool.
  • Dispersion: The range of our emotional landscape. Some people experience intense highs and lows, whereas others persist somewhere in the middle.

We’ll keep all of this in mind for the simulation. We’ll start with a fairly simple simulation with 100 hypothetical people. We’ll need the following packages.

library(psych)
library(tidyverse)
library(sn)

And then we’ll create a function that performs the simulation. Note that each person i has their own attractor, recovery rate, stability, and dispersion. For now we’ll just model random fluctuations in emotions, a sort of Brownian motion. You can imagine our little simulatons (fun name for the hypothetical people in the simulation) sitting around on an average day doing nothing in particular.

simulate_affect <- function(n = 2, time = 250, negative_event_time = NULL) {
  dt <- data.frame(matrix(nrow = time, ncol = 1))
  colnames(dt) <- "time"
  dt$time <- 1:time
  
  valence <- data.frame(matrix(nrow = time, ncol = 0))
  arousal <- data.frame(matrix(nrow = time, ncol = 0))
  
  for(i in 1:n) {
    attractor_v <- rnorm(1, mean = 3.35, sd = .75)
    instability_v <- sample(3:12, 1, replace = TRUE, prob = c(.18, .22, .18, .15, .8, .6, .5, .4, .2, .1))
    dispersion_v <- abs(rsn(1, xi = .15, omega = .02, alpha = -6) * instability_v) #rsn simulates a skewed distribution.
    if(!is.null(negative_event_time)) {
      recovery_rate <- sample(1:50, 1, replace = TRUE) + negative_event_time
      negative_event <- (dt$time %in% negative_event_time:recovery_rate) * seq.int(50, 1, -1)
    }
    else {
      negative_event <- 0
    }
    valence[[i]] <- ksmooth(x = dt$time,
                            y = (negative_event * -.10) + arima.sim(list(order = c(1, 0, 0),
                                               ar = .50),
                                          n = time),
                            bandwidth = time/instability_v, kernel = "normal")$y * dispersion_v + attractor_v 

#instability is modelled in the bandwidth term of ksmooth, such that higher instability results in higher bandwidth (greater fluctuation). 
#dispersion scales the white noise (arima) parameter, such that there are higher peaks and troughs at higher dispersion.
    
    attractor_a <- rnorm(1, mean = .50, sd = .75) + sqrt(instability_v) #arousal attractor is dependent on instability. This is because high instability is associated with higher arousal states.
    instability_a <- instability_v + sample(-1:1, 1, replace = TRUE)
    dispersion_a <- abs(rsn(1, xi = .15, omega = .02, alpha = -6) * instability_a)
    arousal[[i]] <- ksmooth(x = dt$time,
                            y = (negative_event * .075) + arima.sim(list(order = c(1, 0, 0),
                                               ar = .50),
                                          n = time),
                            bandwidth = time/instability_a, kernel = "normal")$y * dispersion_a + attractor_a
  }
  
  valence[valence > 6] <- 6
  valence[valence < 0] <- 0
  arousal[arousal > 6] <- 6
  arousal[arousal < 0] <- 0
  
  colnames(valence) <- paste0("valence_", 1:n)
  colnames(arousal) <- paste0("arousal_", 1:n)
  
  dt <- cbind(dt, valence, arousal)
  
  return(dt)
}

set.seed(190625)

emotions <- simulate_affect(n = 100, time = 300)

emotions %>%
  select(valence_1, arousal_1) %>%
  head()
##   valence_1 arousal_1
## 1  1.328024  5.380643
## 2  1.365657  5.385633
## 3  1.401849  5.390470
## 4  1.436284  5.395051
## 5  1.468765  5.399162
## 6  1.499062  5.402752

So we see the first six rows for participant 1’s valence and arousal. But if we want to plot these across multiple simulatons, we need to wrangle the data into long form. We’ll also compute some measures of within-person deviation. The Root Mean Square Successive Difference (RMSSD) takes into account gradual shifts in the mean. Those who are more emotionally unstable will have a higher RMSSD. For two dimensions (valence and arousal) we’ll just compute the mean RMSSD.

emotions_long <- emotions %>%
  gather(key, value, -time) %>%
  separate(key, into = c("dimension", "person"), sep = "_") %>%
  spread(dimension, value) %>%
  group_by(person) %>%
  mutate(rmssd_v = rmssd(valence),
         rmssd_a = rmssd(arousal),
         rmssd_total = mean(rmssd_v + rmssd_a)) %>%
  ungroup()

Let’s see what this looks like for valence and arousal individually.

emotions_long %>%
  ggplot(aes(x = time, y = valence, group = person, color = rmssd_v)) +
  geom_line(size = .75, alpha = .75) +
  scale_color_gradient2(low = "black", mid = "grey", high = "red", midpoint = median(emotions_long$rmssd_v)) +
  labs(x = "Time",
       y = "Valence",
       color = "Instability",
       title = "Simulated Valence Scores over Time for 100 People") +
  theme_minimal(base_size = 16)

emotions_long %>%
  ggplot(aes(x = time, y = arousal, group = person, color = rmssd_a)) +
  geom_line(size = .75, alpha = .75) +
  scale_color_gradient2(low = "black", mid = "grey", high = "red", midpoint = median(emotions_long$rmssd_a)) +
  labs(x = "Time",
       y = "Arousal",
       color = "Instability",
       title = "Simulated Arousal Scores over Time for 100 People") +
  theme_minimal(base_size = 16)

We see that some lines are fairly flat and others fluctuate more widely. More importantly, most people are somewhere in the middle.

We can get a sense of one simulated person’s affective state space as well. The goal here is to mimic the kinds of models shown in Kuppens, Oravecz, and Tuerlinckx (2010):

emotions_long %>%
  filter(person %in% sample(1:100, 6, replace = FALSE)) %>%
  ggplot(aes(x = valence, y = arousal, group = person)) +
  geom_path(size = .75) + 
  scale_x_continuous(limits = c(0, 6)) +
  scale_y_continuous(limits = c(0, 6)) +
  labs(x = "Valence",
       y = "Arousal",
       title = "Affective State Space for Six Randomly Simulated People") +
  facet_wrap(~person) +
  theme_minimal(base_size = 18) +
  theme(plot.title = element_text(size = 18, hjust = .5))

Animating the Affective State Space

To really appreciate what’s going on, we need to animate this over time. I’ll add some labels to the affective state space so that it’s easier to interpret what one might be feeling at that time. I’ll also add color to show which individuals are more unstable according to RMSSD.

library(gganimate)

p <- emotions_long %>%
  ggplot(aes(x = valence, y = arousal, color = rmssd_total)) +
  annotate("text", x = c(1.5, 4.5, 1.5, 4.5), y = c(1.5, 1.5, 4.5, 4.5), label = c("Gloomy", "Calm", "Anxious", "Happy"),
           size = 10, alpha = .50) + 
  annotate("rect", xmin = 0, xmax = 3, ymin = 0, ymax = 3, alpha = 0.25, color = "black", fill = "white") +
  annotate("rect", xmin = 3, xmax = 6, ymin = 0, ymax = 3, alpha = 0.25, color = "black", fill = "white") +
  annotate("rect", xmin = 0, xmax = 3, ymin = 3, ymax = 6, alpha = 0.25, color = "black", fill = "white") +
  annotate("rect", xmin = 3, xmax = 6, ymin = 3, ymax = 6, alpha = 0.25, color = "black", fill = "white") +
  geom_point(size = 3.5) +
  scale_color_gradient2(low = "black", mid = "grey", high = "red", midpoint = median(emotions_long$rmssd_total)) +
  scale_x_continuous(limits = c(0, 6)) +
  scale_y_continuous(limits = c(0, 6)) +
  labs(x = "Valence",
       y = "Arousal",
       color = "Instability",
       title = 'Time: {round(frame_time)}') +
  transition_time(time) +
  theme_minimal(base_size = 18)

ani_p <- animate(p, nframes = 320, end_pause = 20, fps = 16, width = 550, height = 500)

ani_p

There’s a Storm Coming…

Our simulation does a pretty good job at emulating the natural ebb and flow of emotions, but we know that emotions can be far more volatile. Let’s subject our simulation to a negative event. Perhaps all 100 simulatons co-authored a paper that just got rejected. In the function simulate_affect, there’s an optional argument negative_event_time that causes a negative event to occur at the specified time. For this, we need to consider one more emotion dynamics parameter:

  • Recovery rate: How quickly one recovers from an emotional event. If something bad happens, how long does it take to return to the attractor. You can see how I’ve modelled this parameter in the function above.

So we’ll run the simulation with a negative event arising at t = 150. The negative event will cause a downward spike in valence and an upward spike in arousal.

emotions_event <- simulate_affect(n = 100, time = 300, negative_event_time = 150)

emotions_event_long <- emotions_event %>%
  gather(key, value, -time) %>%
  separate(key, into = c("dimension", "person"), sep = "_") %>%
  spread(dimension, value) %>%
  group_by(person) %>%
  mutate(rmssd_v = rmssd(valence),
         rmssd_a = rmssd(arousal),
         rmssd_total = mean(rmssd_v + rmssd_a)) %>%
  ungroup()

emotions_event_long %>%
  ggplot(aes(x = time, y = valence, group = person, color = rmssd_v)) +
  geom_line(size = .75, alpha = .75) +
  scale_color_gradient2(low = "black", mid = "grey", high = "red", midpoint = median(emotions_event_long$rmssd_v)) +
  labs(x = "Time",
       y = "Valence",
       color = "Instability",
       title = "Simulated Valence Scores over Time for 100 People") +
  theme_minimal(base_size = 16)

emotions_event_long %>%
  ggplot(aes(x = time, y = arousal, group = person, color = rmssd_a)) +
  geom_line(size = .75, alpha = .75) +
  scale_color_gradient2(low = "black", mid = "grey", high = "red", midpoint = median(emotions_event_long$rmssd_a)) +
  labs(x = "Time",
       y = "Arousal",
       color = "Instability",
       title = "Simulated Arousal Scores over Time for 100 People") +
  theme_minimal(base_size = 16)

It’s pretty clear that something bad happened. Of course, some of our simulatons are unflappable, but most experienced a drop in valence and spike in arousal that we might identify as anxiety. Again, let’s visualize this evolving over time. Pay close attention to when the timer hits 150.

p2 <- emotions_event_long %>%
  ggplot(aes(x = valence, y = arousal, color = rmssd_total)) +
  annotate("text", x = c(1.5, 4.5, 1.5, 4.5), y = c(1.5, 1.5, 4.5, 4.5), label = c("Gloomy", "Calm", "Anxious", "Happy"),
           size = 10, alpha = .50) + 
  annotate("rect", xmin = 0, xmax = 3, ymin = 0, ymax = 3, alpha = 0.25, color = "black", fill = "white") +
  annotate("rect", xmin = 3, xmax = 6, ymin = 0, ymax = 3, alpha = 0.25, color = "black", fill = "white") +
  annotate("rect", xmin = 0, xmax = 3, ymin = 3, ymax = 6, alpha = 0.25, color = "black", fill = "white") +
  annotate("rect", xmin = 3, xmax = 6, ymin = 3, ymax = 6, alpha = 0.25, color = "black", fill = "white") +
  geom_point(size = 3.5) +
  scale_color_gradient2(low = "black", mid = "grey", high = "red", midpoint = median(emotions_event_long$rmssd_total)) +
  scale_x_continuous(limits = c(0, 6)) +
  scale_y_continuous(limits = c(0, 6)) +
  labs(x = "Valence",
       y = "Arousal",
       color = "Instability",
       title = 'Time: {round(frame_time)}') +
  transition_time(time) +
  theme_minimal(base_size = 18)

ani_p2 <- animate(p2, nframes = 320, end_pause = 20, fps = 16, width = 550, height = 500)

ani_p2

The overall picture is that some are more emotionally resilient than others. As of now, all the simulatons return to their baseline attractor, but we would realistically expect some to stay stressed or gloomy following bad news. In the coming months I’ll be looking into how to incorporate emotion regulation into the simulation. For example, maybe some of the simulatons use better coping strategies than others? I’m also interested in incorporating appraisal mechanisms that allow for different reactions depending on the type of emotional stimulus.

To leave a comment for the author, please follow the link and comment on their blog: R on Will Hipson.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Continue Reading…

Collapse

Read More

Word2Vec Text Mining & Parallelization in R. Join MünsteR for our next meetup!

(This article was first published on Shirin's playgRound, and kindly contributed to R-bloggers)

In our next MünsteR R-user group meetup on Tuesday, July 9th, 2019, we will have two exciting talks about Word2Vec Text Mining & Parallelization in R!

You can RSVP here: https://www.meetup.com/de-DE/Munster-R-Users-Group/events/262236134/

Thorben Hellweg will talk about Parallelization in R. More information tba!

Maren Reuter from viadee AG will give an introduction into the functionality and use of the Word2Vec algorithm in R.

Text data in its raw form cannot be used as input for machine learning algorithms. Therefore, an information extraction method is required to process plain text into an appropriate representation. By exploiting the semantic and syntactic structure of the text data, the importance of a word can be defined and represented as a vector in a vector space. I.e. the vector can be seen as a numerical „importance“ value. There exist two predominant approaches to represent words as vectors: Either by using the word frequency (ngrams), or by using a prediction model to estimate the relatedness of words. The Word2Vec algorithm by Mikolov et al. belongs to the latter one. This talk will show the functionality of the algorithm and how it can be used in practice.

About Maren:

Maren Reuter is an IT-Consultant at viadee AG and part of the company’s Artificial Intelligence research group. She got her Master’s degree in Information Systems at the University of Münster with a focus in Data Analytics. In her Master thesis she dealt with text mining techniques to predict maintenance tasks in agile software projects. For this purpose, she used the Word2Vec algorithm to build a word vector representation model.

To leave a comment for the author, please follow the link and comment on their blog: Shirin's playgRound.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Continue Reading…

Collapse

Read More

Corporate America is growing more dissatisfied with Donald Trump

Earnings calls suggests that company executives are fed up with Mr Trump’s tariffs and trade wars

Continue Reading…

Collapse

Read More

June 25, 2019

Geelong and the curse of the bye

(This article was first published on R – What You're Doing Is Rather Desperate, and kindly contributed to R-bloggers)

This week we return to Australian Rules Football, the R package fitzRoy and some statistics to ask – why can’t Geelong win after a bye?

(with apologies to long-time readers who used to come for the science)


Code and a report for this blog post are available at Github.

First, some background. In 2011 the AFL expanded from 16 to 17 teams with the addition of the Gold Coast Suns. In the same year, a bye round (a week where some teams don’t play) was reintroduced to the competition. For the purposes of this discussion, we are interested only in bye rounds since 2011, and during the regular home/away season.

You will often hear footy fans claim – sometimes with very little evidence – that “we don’t go well after the bye.” For one team, this is certainly true. That team is Geelong, who have not won a game in the round following a bye since Round 7 in 2011.

Is this unusual? If so, does the available game data suggest any reason?

We start as ever with the excellent fitzRoy package and use get_match_results() to – well, get the match results.

Next, we can use some tidyverse magic to obtain all games in the round immediately before, and after, a bye. This looks long and complicated, so here’s an version with annotations in the comments to explain what’s going on:

results_bye <- results %>% 
  # choose the desired columns
  select(Season, Round, Date, Venue, Home.Team, Away.Team, Margin) %>% 
  # create one column for teams, another to indicate whether home or away
  gather(Status, Team, -Season, -Round, -Margin, -Date, -Venue) %>% 
  # filter for 2011 onwards and only home/away games
  filter(Season > 2010, grepl("^R", Round)) %>% 
  # create a column with the number of each round
  separate(Round, into = c("prefix", "suffix"), sep = 1) %>% 
  mutate(suffix = as.numeric(suffix)) %>% 
  # for each team's games in a season find games
  # the week before and after a bye
  arrange(Season, Team, suffix) %>%
  group_by(Season, Team) %>% 
  mutate(bye = case_when(
    suffix - lead(suffix) == -2 ~ "before",
    suffix - lag(suffix) == 2 ~ "after",
    TRUE ~ as.character(suffix)
  ),
  # margins are with respect to home team so negate them if away
  Margin = ifelse(Status == "Away.Team", -Margin, Margin)) %>% 
  ungroup() %>% 
  # filter for the pre- and post-bye games
  filter(bye %in% c("before", "after")) %>% 
  # calculate result
  mutate(Result = case_when(
    Margin > 0 ~ "W",
    Margin < 0 ~ "L",
    TRUE ~ "D"
  )) %>% 
  # recreate the Round column
  unite(Round, prefix, suffix, sep = "")

Let’s confirm that Geelong have not won after a bye in a long time:

results_bye %>% 
  filter(Team == "Geelong", bye == "after")
Season Round Date Venue Margin Status Team bye Result
2011 R7 2011-05-07 Kardinia Park 66 Home.Team Geelong after W
2011 R23 2011-08-27 Kardinia Park -13 Home.Team Geelong after L
2012 R13 2012-06-22 S.C.G. -6 Away.Team Geelong after L
2013 R13 2013-06-23 Gabba -5 Away.Team Geelong after L
2014 R9 2014-05-17 Subiaco -32 Away.Team Geelong after L
2016 R16 2016-07-08 Kardinia Park -38 Home.Team Geelong after L
2017 R13 2017-06-15 Subiaco -13 Away.Team Geelong after L
2018 R15 2018-06-29 Docklands -2 Away.Team Geelong after L
2019 R14 2019-06-22 Adelaide Oval -11 Away.Team Geelong after L

How does that compare with other teams?

We see all combinations: teams that seem to win more after a bye, as well as teams that win less and teams for which a bye makes no difference. However, Geelong certainly has the worst post-bye win/loss record.

We can ask: is the win/loss count in pre-bye games significantly different to those post-bye? One approach to this is to construct 2×2 contingency tables and perform Fisher’s exact test.

With some more tidyverse magic we can nest the data for each team, generate the tests and summarise the results. This approach is explained very nicely in “Running a model on separate groups” over at Simon Jackson’s blog.

Only Geelong has p < 0.05, suggesting that there is something interesting about the win/loss count after the bye. We’ll just show the first 5 teams here.

results_bye %>% 
  count(Team, bye, Result) %>% 
  nest(-Team) %>% 
  mutate(data = map(data, . %>% spread(Result, n) %>% select(2:3)), 
         fisher = map(data, fisher.test), 
         summary = map(fisher, tidy)) %>% 
  select(Team, summary) %>% 
  unnest() %>% 
  select(-method, -alternative) %>% 
  arrange(p.value) %>% 
  pander(split.table = Inf)
Team estimate p.value conf.low conf.high
Geelong 21.4 0.01522 1.533 1396
Sydney 5.43 0.1698 0.6027 79.83
North Melbourne 0.1736 0.2941 0.002835 2.438
Richmond 3.68 0.3469 0.4059 43.34
Collingwood 3.719 0.3498 0.4048 53.81

We can extend the previous visualisation by further breaking down games into home and away:

Now we see that of Geelong’s 8 post-bye losses, 6 were away games. Port Adelaide have a similar record. Then again, Brisbane have not won an away game before the bye, but you don’t hear anyone talking about Brisbane “not going well before the bye”.

When we look at those 6 away post-bye losses, one was in Melbourne – which in terms of travel distance is not very far from Geelong. The other five were “genuine” away games in Sydney, Brisbane, Adelaide and Perth (2).

Season Round Date Venue Margin Status Team bye Result
2012 R13 2012-06-22 S.C.G. -6 Away.Team Geelong after L
2013 R13 2013-06-23 Gabba -5 Away.Team Geelong after L
2014 R9 2014-05-17 Subiaco -32 Away.Team Geelong after L
2017 R13 2017-06-15 Subiaco -13 Away.Team Geelong after L
2018 R15 2018-06-29 Docklands -2 Away.Team Geelong after L
2019 R14 2019-06-22 Adelaide Oval -11 Away.Team Geelong after L

In addition, three of the losses were against a side also coming off the bye, but playing at home.

Season Round Date Venue Margin Status Team bye Result
2012 R13 2012-06-22 S.C.G. -6 Away.Team Geelong after L
2014 R9 2014-05-17 Subiaco -32 Away.Team Geelong after L
2017 R13 2017-06-15 Subiaco -13 Away.Team Geelong after L

What about away games before the bye? One loss in Melbourne, four wins in Melbourne and one win in Sydney, versus the GWS Giants who at that time were a new and struggling team.

Season Round Date Venue Margin Status Team bye Result
2011 R5 2011-04-26 M.C.G. 19 Away.Team Geelong before W
2011 R21 2011-08-14 Football Park 11 Away.Team Geelong before W
2012 R11 2012-06-08 Docklands 12 Away.Team Geelong before W
2013 R11 2013-06-08 Sydney Showground 59 Away.Team Geelong before W
2016 R14 2016-06-25 Docklands -3 Away.Team Geelong before L
2019 R12 2019-06-07 M.C.G. 67 Away.Team Geelong before W

Our last question: for games after a bye, what was the expected result? By expected we mean “according to the bookmakers”. We can join the match results with historical betting data, assign the expected result (win or loss) to Geelong according to their odds, then compare expected versus actual results. This reveals that six of the eight post-bye losses were unexpected – not surprising as Geelong has been a strong team in the period from 2011 to now.

bye Result Expected n
after L L 2
after L W 6
after W W 1
before L L 1
before L W 1
before W L 1
before W W 6

In summary
Historically, Geelong do seem more prone to losing after a bye round than other teams, and those losses have been unexpected in terms of betting odds.

However, a large proportion of their post-bye losses have been interstate away games, versus strong opponents. Away games before the bye have been either in Melbourne, or versus weaker opponents.

Scheduling may therefore have played a role in Geelong’s post-bye win/loss record.

To leave a comment for the author, please follow the link and comment on their blog: R – What You're Doing Is Rather Desperate.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Continue Reading…

Collapse

Read More

Let’s get it right

Paper: A Case for Backward Compatibility for Human-AI Teams

AI systems are being deployed to support human decision making in high-stakes domains. In many cases, the human and AI form a team, in which the human makes decisions after reviewing the AI’s inferences. A successful partnership requires that the human develops insights into the performance of the AI system, including its failures. We study the influence of updates to an AI system in this setting. While updates can increase the AI’s predictive performance, they may also lead to changes that are at odds with the user’s prior experiences and confidence in the AI’s inferences, hurting therefore the overall team performance. We introduce the notion of the compatibility of an AI update with prior user experience and present methods for studying the role of compatibility in human-AI teams. Empirical results on three high-stakes domains show that current machine learning algorithms do not produce compatible updates. We propose a re-training objective to improve the compatibility of an update by penalizing new errors. The objective offers full leverage of the performance/compatibility tradeoff, enabling more compatible yet accurate updates.


Article: Speech2Face: Learning the Face Behind a Voice

How much can we infer about a person’s looks from the way they speak? In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. We design and train a deep neural network to perform this task using millions of natural videos of people speaking from Internet/Youtube. During training, our model learns audiovisual, voice-face correlations that allow it to produce images that capture various physical attributes of the speakers such as age, gender and ethnicity. This is done in a self-supervised manner, by utilizing the natural co-occurrence of faces and speech in Internet videos, without the need to model attributes explicitly. Our reconstructions, obtained directly from audio, reveal the correlations between faces and voices. We evaluate and numerically quantify how–and in what manner–our Speech2Face reconstructions from audio resemble the true face images of the speakers.


Article: A.I. Ethics Boards Should Be Based on Human Rights

Tech companies should ensure their ethics boards are guided by universal human rights and resist bad faith arguments about diversity and free speech. Who should be on the ethics board of a tech company that’s in the business of artificial intelligence (A.I.)? Given the attention to the devastating failure of Google’s proposed Advanced Technology External Advisory Council (ATEAC) earlier this year, which was announced and then canceled within a week, it’s crucial to get to the bottom of this question. Google, for one, admitted it’s ‘going back to the drawing board.’


Article: Diversity in IT: The Business and Moral Reasons

Everyone talks about diversity, but progress is slow in the real world of business and IT. Sometimes humans talk a good game, but don’t follow through on their promises. Or, it might just be that many of us are well-meaning but unable to achieve a goal because of practical, real-world limitations. A couple of days per week I receive at least one public relations pitch for an article idea about how to improve diversity in an IT team. Go to any tech conference and it’s a good bet that someone will be preaching the need for inclusive hiring and team building, and those sessions are well attended. It’s been clearly established that diversity/inclusiveness is critical to ensuring that artificial intelligence and machine learning applications aren’t plagued by gender, racial, geographic, or other bias. GIGO — garbage in, garbage out — is as relevant today as it was 30 years ago.


Article: The Hitchhiker’s Guide to AI Ethics Part 3: What AI Does & Its Impact

A 3-part series exploring ethics issues in Artificial Intelligence. In Part 1, I explored the what and why of ethics of AI. Part 2 looked at the ethics of what AI is. In Part 3, I wrap up the series with an exploration of the ethics of what AI does and what AI impacts. Across the topics, be it safety or civil rights, be it effects on human behavior or risk of malicious use, a common theme emerges – The need to revisit the role of technologists in dealing with the effects of what they build; going beyond broad tenets like ‘avoid harm’ and ‘respect privacy’, to establishing cause-and-effect, and identifying the vantage points we uniquely hold OR don’t. I had a sense early on that Part 3 would be tough to do justice to as a sub-10 minute post. But 3-parts was just right and over 10 mins too long; so bear with me while I try to prove intuition wrong and fail! Let us explore.


Article: Fahrenheit 2050: the Future Shock of AI & Machine Learning

I’m categorically ambivalent on the subject of technological advances/intrusion into our everyday lives. That includes AI and higher machine learning. On the one hand, AI is used and will be used for brilliant life-changing developments in endeavors like medicine, generative design, scientific research, and a lot of what is to come of which we really have no idea.


Article: AI’s (Auto)Complete Control Over Humanity

It’s certainly easy to tap that suggestion bubble, but is it the right thing to do – for humanity?


Article: The origins of bias and how AI may be the answer to ending its reign

Have you ever had a moment when you realized something you thought was so obviously true, suddenly became so obviously false, and in that instant your whole understanding of the world changed? I think of the day I realized my parents were actually fallible human beings. Up until then, part of my subconscious was convinced they were unquestionable demigods. It wasn’t until my 30’s that I started to really question the validity of my opinions and recognize that I was biased towards their viewpoints – both on the world and about me. I gave more weight to their ideas on politics, religion, morals, and even my own capabilities and characteristics than I did to the opinions or facts presented by my experiences. That was the moment I began making it a point to think harder. To examine why I believed something and attempt to form less parentally biased opinions of my own.

Continue Reading…

Collapse

Read More

Inspiration from a waterfall of pie charts: illustrating hierarchies

Reader Antonio R. forwarded a tweet about the following "waterfall of pie charts" to me:

Water-stats-pie21

Maarten Lamberts loved these charts (source: here).

I am immediately attracted to the visual thinking behind this chart. The data are presented in a hierarchy with three levels. The levels are nested in the sense that the pieces in each pie chart add up to 100%. From the first level to the second, the category of freshwater is sub-divided into three parts. From the second level to the third, the "others" subgroup under freshwater is sub-divided into five further categories.

The designer faces a twofold challenge: presenting the proportions at each level, and integrating the three levels into one graphic. The second challenge is harder to master.

The solution here is quite ingenious. A waterfall/waterdrop metaphor is used to link each layer to the one below. It visually conveys the hierarchical structure.

***

There remains a little problem. There is a confusion related to the part and the whole. The link between levels should be that one part of the upper level becomes the whole of the lower level. Because of the color scheme, it appears that the part above does not account for the entirety of the pie below. For example, water in lakes is plotted on both the second and third layers while water in soil suddenly enters the diagram at the third level even though it should be part of the "drop" from the second layer.

***

I started playing around with various related forms. I like the concept of linking the layers and want to retain it. Here is one graphic inspired by the waterfall pies from above:

Redo_waterfall_pies

 

Continue Reading…

Collapse

Read More

Distilled News

Dialog

Dialog is a domain-specific language for creating works of interactive fiction. It is heavily inspired by Inform 7 (Graham Nelson et al. 2006) and Prolog (Alain Colmerauer et al. 1972). An optimizing compiler, dialogc, translates high-level Dialog code into Z-code, a platform-independent runtime format originally created by Infocom in 1979.


Machine Learning: Dimensionality Reduction via Linear Discriminant Analysis

A machine learning algorithm (such as classification, clustering or regression) uses a training dataset to determine weight factors that can be applied to unseen data for predictive purposes. Before implementing a machine learning algorithm, it is necessary to select only relevant features in the training dataset. The process of transforming a dataset in order to select only relevant features necessary for training is called dimensionality reduction. Dimensionality reduction is important because of three main reasons:
• Prevents Overfitting: A high-dimensional dataset having too many features can sometimes lead to overfitting (model captures both real and random effects).
• Simplicity: An over-complex model having too many features can be hard to interpret especially when features are correlated with each other.
• Computational Efficiency: A model trained on a lower-dimensional dataset is computationally efficient (execution of algorithm requires less computational time).
Dimensionality reduction therefore plays a crucial role in data preprocessing.


Natural Language Interface to DataTable

You have to write SQL queries to query data from a relational database. Sometimes, you even have to write complex queries to do that. Won’t it be amazing if you could use a chatbot to retrieve data from a database using simple English? That’s what this tutorial is all about.


Data Literacy: Using the Socratic Method

How can organizations and individuals promote Data Literacy? Data literacy is all about critical thinking, so the time-tested method of Socratic questioning can stimulate high-level engagement with data.


Designing Tools and Activities for Data Literacy Learners

Data-centric thinking is rapidly becoming vital to the way we work, communicate and understand in the 21st century. This has led to a proliferation of tools for novices that help them operate on data to clean, process, aggregate, and visualize it. Unfortunately, these tools have been designed to support users rather than learners that are trying to develop strong data literacy. This paper outlines a basic definition of data literacy and uses it to analyze the tools in this space. Based on this analysis, we propose a set of pedagogical design principles to guide the development of tools and activities that help learners build data literacy. We outline a rationale for these tools to be strongly focused, well guided, very inviting, and highly expandable. Based on these principles, we offer an example of a tool and accompanying activity that we created. Reviewing the tool as a case study, we outline design decisions that align it with our pedagogy. Discussing the activity that we led in academic classroom settings with undergraduate and graduate students, we show how the sketches students created while using the tool reflect their adeptness with key data literacy skills based on our definition. With these early results in mind, we suggest that to better support the growing number of people learning to read and speak with data, tool de- signers and educators must design from the start with these strong pedagogical principles in mind.


A Data and Analytics Leader’s Guide to Data Literacy

Imagine an organization where the marketing department speaks French, the product designers speak German, the analytics team speaks Spanish and no one speaks a second language. Even if the organization was designed with digital in mind, communicating business value and why specific technologies matter would be impossible. That’s essentially how a data-driven business functions when there is no data literacy. If no one outside the department understands what is being said, it doesn’t matter if data and analytics offers immense business value and is a required component of digital business.


Being right matters : model-compliant events in predictive processing

While prediction errors (PE) have been established to drive learning through adaptation of internal models, the role of model-compliant events in predictive processing is less clear. Checkpoints (CP) were recently introduced as points in time where expected sensory input resolved ambiguity regarding the validity of the internal model. Conceivably, these events serve as on-line reference points for model evaluation, particularly in uncertain contexts. Evidence from fMRI has shown functional similarities of CP and PE to be independent of event-related surprise, raising the important question of how these event classes relate to one another. Consequently, the aim of the present study was to characterise the functional relationship of checkpoints and prediction errors in a serial pattern detection task using electroencephalography (EEG). Specifically, we first hypothesised a joint P3b component of both event classes to index recourse to the internal model (compared to non-informative standards, STD). Second, we assumed the mismatch signal of PE to be reflected in an N400 component when compared to CP. Event-related findings supported these hypotheses. We suggest that while model adaptation is instigated by prediction errors, checkpoints are similarly used for model evaluation. Intriguingly, behavioural subgroup analyses showed that the exploitation of potentially informative reference points may depend on initial cue learning: Strict reliance on cue-based predictions may result in less attentive processing of these reference points, thus impeding upregulation of response gain that would prompt flexible model adaptation. Overall, present results highlight the role of checkpoints as model-compliant, informative reference points and stimulate important research questions about their processing as function of learning und uncertainty.


The Divergence Index: A new polarization measure for ordinal categorical variables

In the statistical literature, for ordinal types of data, are known lots of indicators to measure the degree of the polarization phenomenon. Typically, many of the widely used measures of distributional variability are defined as a function of a reference point, which in some ‘sense’ could be considered representative for the entire population. This function indicates how much all the values differ from the point that is considered ‘typical’. Of all measures of variability, the variance is a well-known example that use the mean as a reference point. However, mean-based measures depend to the scale applied to the categories (Allison & Foster, 2004) and are highly sensitive to outliers. An alternative approach is to compare the distribution of an ordinal variable with that of a maximum dispersion, that is the two-point extreme distribution (i.e A distribution in which half of the population is concentrated in the lowest category and half in the top category). Using this procedure, three measures of variation for ordinal categorical data have been suggested, the Linear Order of Variation – LOV (Berry & Mielke, 1992), the Index Order of Variation – IOV (Leik, 1966) and the COV (Kvalseth, Coefficients of variations for nominal and ordinal catego…). All these indices are based on the cumulative relative frequency distribution (CDF), since this contains all the distributional information of any ordinal variable (Blair & Lacy, 1996). Consequently, none of these measures rely on ordinal assumptions about distances between categories.


Disposable Technology: A Concept Whose Time Has Come

Imagine…imagine that you have been challenged to play Steph Curry, the greatest 3-point shooter in the history of the National Basketball Association, in a game of 1×1. Yea, a pretty predictable outcome for 99.9999999% of us. But now image that Steph Curry has to wear a suit of knight’s armor as part of that 1×1 game. The added weight, the obstructed vision, and the lack of flexibility, agility and mobility would probably allow even the average basketball player to beat him. Welcome to today’s technology architecture challenge!


Deep Knowledge: Next Step After Deep Learning

Data science has been around since mankind first did experiments and recorded data. It is only since the advent of big and heterogenous data that the term ‘Data Science’ was coined. With such a long and varied history, the field should benefit from the great diversity of perspectives that are brought by practitioners from different fields. My own path started with signal analysis. I was building high speed interferometric photon counting systems, where my ‘data science’ was dominated by signal to noise and information encoding. The key aspect of this was that data science was applied to extend or modify our knowledge (understanding) of the physical system. Later my data science efforts focused on stochastic dynamical systems. While the techniques and tools employed were different than those used in signal analysis, the objective remained the same, to extend or modify our knowledge of a system.


How to Increase the Impact of Your Machine Learning Model

Typically, industry machine learning projects aren’t based on a fixed, preexisting reference dataset like MINST. A lot of effort goes into procuring and cleaning training data. As these tasks are highly project-specific and can’t be generalized, they are rarely talked about and receive no media attention. Similar is true for the post-modelling steps: How to bring your model into production? How will the model outputs create actual business value? And by the way, shouldn’t you have been thinking about these questions beforehand? While the model serving workflows are somewhat transferable, monetization strategies are usually specific and not made public. With these considerations, we can paint a more accurate picture.


Five Command Line Tools for Data Science

One of the most frustrating aspects of data science can be the constant switching between different tools whilst working. You can be editing some code in a Jupyter Notebook, having to install a new tool on the command line and maybe editing a function in an IDE all whilst working on the same task. Sometimes it is nice to find ways of doing more things in the same piece of software. In the following post, I am going to list some of the best tools I have found for doing data science on the command line. It turns out there are many tasks that can be completed via simple terminal commands than I first thought and I wanted to share some of those here.


Genetic Artificial Neural Networks

Artificial Neural Networks are inspired by the nature of our brain. Similarly Genetic Algorithms are inspired by the nature of evolution. In this article I propose a new type of neural network to assist in training: Genetic Neural Networks. These neural networks hold properties like fitness, and use a Genetic Algorithm to train randomly generated weights. Genetic optimization occurs prior to any form of backpropagation to give any type of gradient descent a better starting point. This project can be found on my GitHub here, with an explanation in the snippets below.


Inferring New Relationships using the Probabilistic Soft Logic

The main purpose of data is to deliver useful information that can be manipulated to take important decisions. In the early days of computing, such useful information could be queried from direct data such as those stored in databases. If the information queried for was not available in those data-stores, then the system would not be able to respond to the users’ queries. Consequently, at present, if we consider the mass amount of data that is present on the web and the exponentially growing number of web users, it is hard to predict and store what exactly these users will be querying for. Simply stated, we cannot manually hard-code the responses to each and every expected question from the users. So what is the solution? The best answer is to extract knowledge from data, refine and recompose that into a knowledge graph, which we can use to answer queries. However, knowledge extraction has proven to be a non-trivial problem. Hence, we use a statistical relational learning methodology that considers past experiences and learns new relationships or similarities between facts in order to extract such knowledge. The Probabilistic soft logic(PSL) is one such statistical relational learning framework that is used for building knowledge graphs.


Probabilistic Soft Logic (PSL)

Probabilistic soft logic (PSL) is a machine learning framework for developing probabilistic models. PSL models are easy to use and fast. You can define models using a straightforward logical syntax and solve them with fast convex optimization. PSL has produced state-of-the-art results in many areas spanning natural language processing, social-network analysis, knowledge graphs, recommender system, and computational biology. The PSL framework is available as an Apache-licensed, open source project on GitHub with an active user group for support.


You Are Having a Relationship With a Chatbot!

Mindful strategies to recognize, and stop that chatbot from getting too close. In the age of chatbot proliferation, we are often wondering who we are talking to? When we chat with customer service representatives online, we expect the first line customer service representatives to be chatbots. We tell them about our problems. Chatbots help us find a solution. The ‘problem and solution’ nature of customer service presents a classic problem best solved by artificial intelligence. Taking this one step further, what if you are a twenty-something career driven person who has literally no time during your busy packed working day to check in with your significant other. You’ve worked hard to earn your significant other’s love. You want to juggle your relationship with your career. But, there’s just no time.


Distributed Deep Learning Pipelines with PySpark and Keras

In this notebook I use PySpark, Keras, and Elephas python libraries to build an end-to-end deep learning pipeline that runs on Spark. Spark is an open-source distributed analytics engine that can process large amounts of data with tremendous speed. PySpark is simply the python API for Spark that allows you to use an easy programming language, like python, and leverage the power of Apache Spark.

Continue Reading…

Collapse

Read More

AnnoNLP conference on data coding for natural language processing

This workshop should be really interesting:

Silviu Paun and Dirk Hovy are co-organizing it. They’re very organized and know this area as well as anyone. I’m on the program committee, but won’t be able to attend.

I really like the problem of crowdsourcing. Especially for machine learning data curation. It’s a fantastic problem that admits of really nice Bayesian hierarchical models (no surprise to this blog’s audience!).

The rest of this note’s a bit more personal, but I’d very much like to see others adopting similar plans for the future for data curation and application.

The past

Crowdsourcing is near and dear to my heart as it’s the first serious Bayesian modeling problem I worked on. Breck Baldwin and I were working on crowdsourcing for applied natural language processing in the mid 2000s. I couldn’t quite figure out a Bayesian model for it by myself, so I asked Andrew if he could help. He invited me to the “playroom” (a salon-like meeting he used to run every week at Columbia), where he and Jennifer Hill helped me formulate a crowdsourcing model.

As Andrew likes to say, every good model was invented decades ago for psychometrics, and this one’s no different. Phil Dawid had formulated exactly the same model (without the hierarchical component) back in 1979, estimating parameters with EM (itself only published in 1977). The key idea is treating the crowdsourced data like any other noisy measurement. Once you do that, it’s just down to details.

Part of my original motivation for developing Stan was to have a robust way to fit these models. Hamiltonian Monte Carlo (HMC) only handles continuous parameters, so like in Dawid’s application of EM, I had to marginalize out the discrete parameters. This marginalization’s the key to getting these models to sample effectively. Sampling discrete parameters that can be marginalized is a mug’s game.

The present

Coming full circle, I co-authored a paper with Silviu and Dirk recently, Comparing Bayesian models of annotation, that reformulated and evaluated a bunch of these models in Stan.

Editorial Aside: Every field should move to journals like TACL. Free to publish, fully open access, and roughly one month turnarond to first decision. You have to experience journals like this in action to believe it’s possible.

The future

I want to see these general techniques applied to creating probabilistic corpora, to online adaptative training data (aka active learning), to joint corpus inference and model training (a la Raykar et al.’s models), and to evaluation.

P.S. Cultural consensus theory

I’m not the only one who recreated Dawid and Skene’s model. It’s everywhere these days.

Recently, I just discovered an entire literature dating back decades on cultural consensus theory, which uses very similar models (I’m pretty sure either Lauren Kennedy or Duco Veen pointed out the literature). The authors go more into the philosophical underpinnings of the notion of consensus driving these models (basically the underlying truth of which you are taking noisy measurements). One neat innovation in the cultural consensus theory literature is a mixture model of truth—you can assume multiple subcultures are coding the data with different standards. I’d thought of mixture models of coders (say experts, Mechanical turkers, and undergrads), but not of the truth.

In yet another small world phenomenon, right after I discovered cultural consensus theory, I saw a cello concert organized through Groupmuse by a social scientist at NYU I’d originally met through a mutual friend of Andrew’s. He introduced the cellist, Iona Batchelder, and added as an aside she was the daughter of well known social scientists. Not just any social scientists, the developers of cultural consensus theory!

Continue Reading…

Collapse

Read More

New Resource for Learning Choroplethr

(This article was first published on R – AriLamstein.com, and kindly contributed to R-bloggers)

Last week I had the honor of giving a 1 hour talk about Choroplethr at a private company.

When this company reached out to me about speaking there, I originally planned to give the same talk I gave at CDC two years ago.

But as I reviewed the CDC talk, I realized two things:

  1. My ability to teach and explain Choroplethr has improved a lot since then.
  2. Choroplethr has changed in the last few years.

Because of this, I decided to rewrite the talk from scratch.

I wanted to share this new and improved resource with a wider audience, so I just recorded myself giving this new talk. You can view the talk below.

I hope this helps you get the most out of Choroplethr!

Interested in having me give this talk at your company? Contact me and let me know!

The post New Resource for Learning Choroplethr appeared first on AriLamstein.com.

To leave a comment for the author, please follow the link and comment on their blog: R – AriLamstein.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Continue Reading…

Collapse

Read More

Thanks for reading!