> For the complete documentation index, see [llms.txt](https://umbertogriffo.gitbook.io/how-to-quickly-reproduce-your-computer-vision-mode/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://umbertogriffo.gitbook.io/how-to-quickly-reproduce-your-computer-vision-mode/readme.md).

# How to quickly reproduce your Computer Vision Models

Nowadays, Machine Learning is extensively used to solve complex Computer Vision problems like Image Classification, Object Detection, Object Segmentation and so on, achieving state-of-the-art results during the latest years. Typically, an image-based training workflow comprises two main processes: **dataset creation** and **training**.

Supposing a data scientist team is building a face recognition model, a dataset creation process will accept a set of input files and produce a set of output files. For example, one might take selfies as input and extract tightly-cropped faces from them. The training processes have no input and accept outputs from dataset creation processes. For example, we might train several models from the tightly-cropped selfie dataset. The number of images the team needs to perform both operations properly could be massive and varying over time following the requirements of the experiments. Making all the experiments reproducible introduces some technical challenges to tackle.

We need to design a reproducible workflow that allows the data scientist to:

* Create new datasets and explore an index of existing datasets.
* Explore data.
* Track the data a model was trained with.
* Quickly create new datasets from both original data or other datasets.
* Explore the contents of a dataset and remove specific items from it.

Inspired by this well written [tutorial](https://realpython.com/python-data-version-control/) about Data Versioning With Python and DVC, we will see how to deal with all of them using different tools and reducing costs by not duplicating original media or other large data needlessly.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://umbertogriffo.gitbook.io/how-to-quickly-reproduce-your-computer-vision-mode/readme.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
