Unsupervised learning from narrated instruction videos, CVPR, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-01171193
Joint discovery of object states and manipulation actions, ICCV, vol.2, p.5, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01676084
DIFFRAC: A discriminative and flexible framework for clustering, NIPS, vol.2, p.7, 2007. ,
Unsupervised learning by predicting noise, ICML, 2017. ,
Weakly supervised action labeling in videos under ordering constraints, ECCV, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01053967
Weakly-supervised alignment of video with text, ICCV, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01154523
Deep Clustering for Unsupervised Learning of Visual Features, ICCV, 2018. ,
Quo vadis, action recognition? a new model and the kinetics dataset, CVPR, 2004. ,
Scaling egocentric vision: The EPIC-KITCHENS dataset, In ECCV, issue.2, 2018. ,
You-do, i-learn: Discovering task relevant objects and their modes of interaction from multi-user egocentric video, BMVA, 2014. ,
Demo2vec: Reasoning object affordances from online videos, CVPR, 2018. ,
Describing objects by their attributes, CVPR, vol.2, p.3, 2009. ,
Learning visual attributes, NIPS, 2007. ,
From lifestyle vlogs to everyday interactions, CVPR, 2018. ,
Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition, ICCV, 2013. ,
Deep residual learning for image recognition, CVPR, 2016. ,
Cnn architectures for large-scale audio classification, ICASSP, 2017. ,
Connectionist temporal modeling for weakly supervised action labeling, ECCV, vol.1, 2016. ,
Unsupervised visual-linguistic reference resolution in instructional videos, CVPR, 2017. ,
Finding "it": Weakly-supervised reference-aware visual grounding in instructional video, CVPR, 2018. ,
Adam: A method for stochastic optimization, 2014. ,
Weakly supervised learning of actions from transcripts, CVIU, vol.1, 2017. ,
Recognizing human actions by attributes, CVPR, 2011. ,
What's cookin'? Interpreting cooking videos using text, speech and vision, NAACL, vol.2, p.5, 2015. ,
From Red Wine to Red Tomato: Composition with Context, CVPR, 2017. ,
Weakly supervised action learning with rnn based fine-to-coarse modeling, CVPR, 2017. ,
Action sets: Weakly supervised action segmentation without ordering constraints, CVPR, 2018. ,
Unsupervised learning and segmentation of complex activities from video, CVPR, 2018. ,
Unsupervised semantic parsing of video collections, ICCV, vol.1, p.5, 2015. ,
Two-stream convolutional networks for action recognition in videos, NIPS, vol.1, 2014. ,
Action recognition with improved trajectories, ICCV, vol.1, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00873267
Maximum margin clustering, NIPS, 2004. ,
Human action recognition by learning bases of action attributes and parts, ICCV, 2011. ,
Commonly uncommon: Semantic sparsity in situation recognition, Proceedings of the CVPR, vol.2017 ,
Towards automatic learning of procedures from web instructional videos, AAAI, vol.2, p.5, 2018. ,
, Outputs of the classifier are shown in blue. Correctly localized steps are shown in green. False detections are shown in red
Learning from narrated instruction videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01580630
Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems, vol.26, pp.3111-3119, 2013. ,
, Enriching word vectors with subword information. arXiv, 2017.
Visualizing data using t-sne, Journal of machine learning research, 2008. ,