The fourth annual TensorFlow Dev Summit was held on March 11, 2020. The event was entirely live-streamed, which means I could watch all the talks from the comfort of my home (though I didn’t tune in to the livestream but watched the recordings instead because, you know, timezones). In this post I’ll highlight some of the developments of TensorFlow that I personally find the most exciting. You can also find a complete recap of the event here.
Bias can happen at every step of the ML pipeline, so it’s important to evaluate for human bias at every step. Errors caused by this bias can unequally impact a group of users more than others. With this in mind, we need to do evaluation over various individual slices, or groups of users, instead straightforwardly relying on the overall metrics.
The opening of the talk is an easy-to-understand example on why evaluating over various individual slices matters. It presents an example from the QuickDraw dataset, a dataset obtained from the doodles contributed by over 15 million players from the game Quick, Draw: the model fails at recognizing shoes with heels at the bottom as shoes, because users keep drawing shoes that don’t have heels at the bottom, like sneakers.
That’s just one example, and perhaps it’s one with a minor repercussion. But what if it happens to cases such as a toxic comment classification system, where sentences such as “I am a woman who is deaf” is somehow perceived as a toxic comment?
Fairness Indicators is a suite of tools that helps developers to evaluate fairness metrics, like False Positive Rate (FPR) and False Negative Rate (FNR), for binary and multi-class classification. It can answer questions such as, “what groups are underperforming or important to look at?” by enables developers to deep dive into individual slices, explore root causes of the disparities happening, and show opportunities for improvement. The Fairness Indicators repository is available on GitHub.
The talk shows a couple of use cases: first, how we can use Fairness Indicators to evaluate model-based remediation through constraint-based optimization implemented using TFCO. Say that you have a model trained to detect whether a person is smiling or not using an unconstrained
tf.keras.Sequential model. You evaluate your model performance by slicing the data as “Young” and “Not Young” based on age. The FPR for the age group “Not Young” is higher (0.109) than “Young” (0.039), as you can see from the visuals provided by Fairness Indicators:
With TFCO, you can set a constraint on the subset you define (in this case, the age group). It ensures that it chooses model with FPRs where the overall FPR stays the same so we know that the model is improved and we’re not just moving the decision thresholds. From the visuals provided by Fairness Indicators you can see that the FPR for the “Not Young” group is reduced:
You can also compare models across multiple thresholds easily, with predefined slices—in this case Not Young vs Young, metric which is FPR, and multiple thresholds of 0.5 and 0.3:
You can quickly see that CNN outperforms SVM across all thresholds.
This is exciting because I do a lot of slicing with various groups and multiple thresholds across multiple models during evaluation. So far I’ve always done them manually and it really can be a headache. Fairness Indicators makes the processes much simpler so I can’t wait to try it out for my own projects myself.
You can watch the talk here (up to minute 12:41).
Data is key to machine learning, so it’s no surprise that we need to answer questions about privacy when building ML systems. TensorFlow is currently working on a couple of projects related to privacy. One of them is TensorFlow Privacy, which is a library that contains TensorFlow optimizers to train ML models with differential privacy. However, there are some tradeoffs which are emphasized in the talk. Training with privacy might reduce accuracy and increase training time. If a model is already biased, differential privacy might make it even more biased as well. The talk doesn’t go too in-depth about these concerns, but these points are again reiterated at the end of the talk: ML and privacy is a relatively new field, research is ongoing and people are still learning. Keeping user data private is important, but the tradeoffs between performance and privacy are real, so we all need to find the right balance.
The talk also mentions TensorFlow Federated which was released last year (also already mentioned in last year’s TF Dev Summit). TensorFlow Federated allows users to train a model without collecting the raw data, in contrast of traditional ML that requires us to centralize data (e.g. in a cloud) which can create privacy issues (on top of resource issue!). How it works is that a shared global model will be trained across many participating clients that keep their training data locally. An example usecase is next-word prediction without having to upload users’ sensitive data.
You can watch the talk here (from minute 12:41).
Performance Profiling in TensorFlow 2.x
Machine learning models are resource-hungry, so it’s important to ensure that you’re running an optimized version of your model. At the summit, Google announced TensorFlow Performance Profiling which will allow you to profile your models systematically. The profiler consists several tools:
- Overview Page: this page shows you a summary of your model performance, step-time graph, and recommendation for next steps
- Input pipeline analyzer: this is specific for input bound-related issues. The analyzer will walk you through detailed steps on what to do to resolve performance bottlenecks at the input pipeline
- TensorFlow stats: displays performance of every TensorFlow op during the profiling session
- Trace Viewer: shows a timeline of events that occurred on the CPU and the GPU during the profiling session
- GPU kernel stats: this is specific for performance statistics for every GPU accelerated kernel
I tried it out myself to see how the experience is like by running a model trained on the MNIST dataset running on GPU. Using Tensorboard, you can navigate to the tab called “Profile”:
You can navigate between the four tools using the dropdown on the left:
There are a couple elements on Overview Page. First up is the Performance Summary, which shows you the Average Step Time which is broken down into more detailed information such as Compilation and Input Time. For easier viewing, you can check out the chart as well.
You’ll immediately spot that your program is highly input-bound, meaning that it spends most of its time in the data input pipeline. Thanks to this chart you can prioritize your debugging steps—it makes sense to tackle your input pipeline before moving on to other aspects of your program.
What’s exciting is that it also provides you an automated performance guide that recommends you the next steps that you need to do:
The guide will show you a list of tools that are relevant to your performance issue. If you click the link to the Input Pipeline Analyzer, you will see detailed steps that you can take:
I find these specific recommendations to be very helpful. You can immediately try to define
num_parallel_calls if you’re using
Dataset.map() during the data preprocessing step and see if your program is still input bound.
Overall, the performance profiler will help you debug your model’s performance in a more systematic manner. This way you can focus on implementing the changes that you need to do instead of spending too much time analyzing and determining what’s the next step. You can watch the talk here or try to implement the profiler yourself.
TensorFlow Runtime (TFRT)
TensorFlow Runtime (or codenamed TFRT) is a new runtime that is going to replace the existing runtime on TensorFlow built on top of MLIR. Why build a new runtime? The reason is largely driven by feedback and observations in the current ML/TF ecosystem. For example, we’re starting to see faster and bigger models, which eventually require more performant eager and graph execution. Also, we would like to have ML models deployed everywhere across all platforms, as we’ve seen from the exciting developments of TensorFlow.js and TF Lite which enable ML models on the web and mobile/IoT devices respectively. Given these observations, TFRT’s vision is to be a unified, extensible runtime that provides best-in-class performance across a wide variety of domain specific hardware.
Yes, TFRT is not something that we get to “use” the way we use Tensorboard or performance profiling tools, but this is one of the things I’m most excited about because it will impact a lot of important use cases. We’ll have better performance and error reporting during training, so it will be easier for us to debug during their training process. In the deployment to production step, we will also benefit from improved performance and reduced CPU usage. TFRT also enables deployments across diverse platforms.
The team has run a benchmark and results show that TFRT is 28% faster compared to the current TF runtime (with more optimizations to come) which is super exciting! It will be available as opt-in soon to give time for the team to fix issues and fine-tune its performance. Eventually, TFRT will become the default runtime for TensorFlow. Open-sourcing TFRT is also on the roadmap.
You can watch the talk from the summit here.
The reason why I got into tech was web development, so I have a soft spot for everything that involves both the web and ML. TensorFlow.js has definitely gone a long way since it was first released three years ago as deeplearn.js. The TensorFlow.js talk walks us through exciting developments in TensorFlow.js from 2019 and 2020 so far.
In 2019, TensorFlow.js has been adopted by companies like Airbnb and Uber. Google also released Teachable Machine 2.0 which allows you to train a model to recognize your own images, sounds, & poses right on the website. I’ve used Teachable Machine 1.0 before but this newer version definitely has an improved UX, and Here’s me making my own version of Don’t Touch Your Face. You can try it out directly on Teachable Machine too!
What else? TensorFlow.js has released models such as Facemesh, Handtrack, and Mobile BERT. I’m personally most excited about MobileBERT. BERT is a pre-training model developed by Google which has shown top results in tasks such as Question Answering. BERT itself is pretty big, so it’s nice to see a compressed version of it that runs 4x faster and has 4x smaller model size. MobileBERT takes a passage and a question as input. It will then return a segment of the passage that most likely answers the question given as the input. You can get the model from TensorFlow Hub.
Another update: Besides of WebGL, TensorFlow.js now has a Web Assembly (WASM) backend! Why is this exciting? Although in some cases WebGL still outperforms WASM, WASM can outperform WebGL for ultra-lite models. WASM is also more accessible in lower-end devices—it caters 90% of devices compared to 53% of devices for WebGL. TensorFlow.js is currently working on a WebGPU backend which is expected to be better than WebGL.
I think all talks share a common thread, which is making ML available in many more platforms for many more use cases. This is visible from things such as the various use cases of TF.js, building a new runtime that makes ML models more performant no matter where they’re run, profiler that makes it easier for people to optimize their ML models, & most importantly how we can scale ML for various use cases (which means a larger number of people impacted) without enabling and augmenting bias.
ML is getting better performance & it’s making more impact, but at the same time it’s undeniably getting more complex—not just from the tech perspective, but also from the human side of it. It’s important for tools and platforms evolve with it & make ML accessible despite these complexities, and I think that’s exactly what we’re seeing from TensorFlow.
Huge props for the TensorFlow team for live-streaming the entire TensorFlow Dev Summit 2020 online. I’m looking forward to all the next releases and developments in 2020!