I'm getting an error running tokenize with tensorflow-text==2.2.0rc2 that I can only reproduce on macs.
(same error on rc1, and possibly earlier versions)
Steps to reproduce:
- Setup:
python3 -m venv .test_venv
source .test_venv/bin/activate
pip install --upgrade pip
pip install tensorflow==2.2.0rc3
pip install tensorflow-text==2.2.0rc2
-
Download vocab.txt into the dir you plan to run the test:
aws s3 cp s3://models.huggingface.co/bert/bert-base-uncased-vocab.txt ./vocab.txt
-
And then run these 5 lines in python
import tensorflow as tf
from tensorflow_text import BertTokenizer
tokenizer = BertTokenizer('./vocab.txt')
test2 = tf.convert_to_tensor(
'Hello', dtype=tf.string
)
tokenizer.tokenize(test2)
Works on linux, (returns <tf.RaggedTensor [[[100]]]>)
On Mac, it throws an error. I've run on two separate macs (one with all totally fresh installs)
2020-04-16 13:18:07.892934: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at wordpiece_kernel.cc:204 : Invalid argument: Trying to access resource using the wrong type. Expected N10tensorflow6lookup15LookupInterfaceE got N10tensorflow6lookup15LookupInterfaceE
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/sylvia/Desktop/workspace/.tfvenv/lib/python3.7/site-packages/tensorflow_text/python/ops/bert_tokenizer.py", line 222, in tokenize
return self._wordpiece_tokenizer.tokenize(tokens)
File "/Users/sylvia/Desktop/workspace/.tfvenv/lib/python3.7/site-packages/tensorflow_text/python/ops/wordpiece_tokenizer.py", line 100, in tokenize
subword, _, _ = self.tokenize_with_offsets(input)
File "/Users/sylvia/Desktop/workspace/.tfvenv/lib/python3.7/site-packages/tensorflow_text/python/ops/wordpiece_tokenizer.py", line 156, in tokenize_with_offsets
tokens.flat_values)
File "/Users/sylvia/Desktop/workspace/.tfvenv/lib/python3.7/site-packages/tensorflow_text/python/ops/wordpiece_tokenizer.py", line 182, in tokenize_with_offsets
**kwargs))
File "<string>", line 141, in wordpiece_tokenize_with_offsets
File "/Users/sylvia/Desktop/workspace/.tfvenv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Trying to access resource using the wrong type. Expected N10tensorflow6lookup15LookupInterfaceE got N10tensorflow6lookup15LookupInterfaceE [Op:WordpieceTokenizeWithOffsets]
running on python 3.7.6
I'm getting an error running tokenize with tensorflow-text==2.2.0rc2 that I can only reproduce on macs.
(same error on rc1, and possibly earlier versions)
Steps to reproduce:
Download vocab.txt into the dir you plan to run the test:
aws s3 cp s3://models.huggingface.co/bert/bert-base-uncased-vocab.txt ./vocab.txtAnd then run these 5 lines in python
Works on linux, (returns <tf.RaggedTensor [[[100]]]>)
On Mac, it throws an error. I've run on two separate macs (one with all totally fresh installs)
running on python 3.7.6