% New Commands for variables \newcommand{\MET}{$E_{\text{T}}^{\text{miss}} $\xspace} \newcommand{\MTTWO}{$M_{\text{T2}} $\xspace} \newcommand{\HT}{$H_{\text{T}} $\xspace} \newcommand{\NJETS}{$N_{\text{jets}} $\xspace} \newcommand{\NTOPS}{$N_{\text{tops}} $\xspace} \newcommand{\NBJETS}{$N_{\text{b-jets}} $\xspace} \newcommand{\ETA}{$\eta $\xspace} \newcommand{\DELTAR}{$\Delta R $\xspace} \newcommand{\PT}{$p_{\text{T}} $\xspace} \newcommand{\DELTAPHI}{$\Delta\phi $\xspace} \newcommand{\PHI}{$\phi $\xspace} \newcommand{\GEV}{$\text{GeV} $\xspace} \newcommand{\TTBAR}{$t\bar{t} $\xspace} %Search \chapter{Analysis Description} \label{ch:analysis} In supersymmetric particle searches, the stop (top squark), the SUSY partner of the top quark is one of particular interest. For SUSY to provide a natural solution to the hierarchy problem, the mass of the stop should be relatively light. With large a stop mass we would lose naturalness since we would need yet another round of careful adjustment, or fine-tuning, of parameters to cancel quantum corrections to the Higgs mass, a current problem with the standard model. For this analysis, we assume that the stop mass is sufficiently large for it to decay into a top quark and a weakly interacting supersymmetric particle $\tilde{\chi}_{1}^{0}$. We target our search on processes that produce a pair of stop quarks in the chain: $pp \rightarrow \tilde{t}\tilde{t}^{*} \rightarrow t\bar{t}\tilde{\chi}_{1}^{0}\tilde{\chi}_{1}^{0}$. However, since the analysis is inclusive, other channels can also contribute. Signal models of interest are shown in Fig.~\ref{fig:signalmodels} where we require that the top quarks decay hadronically.\\ \\ A central feature of the analysis is the top quark tagging algorithm that will be described in Section~\ref{sec:toptagging}. Since more than two thirds of the time a top quark decays hadronically, the decay products are reconstructed as a stream of particles known as "jets" plus an uncharged weakly interacting neutralino $\tilde{\chi}_{1}^{0}$ that escapes detection. The typical event signature thus consists of many jets and a large amount of missing energy \MET . The analysis is based on exclusive search regions that are a combination of top quark candidates, b-tagged jets~\cite{Chatrchyan:2012jua} , missing transverse energy (\MET), the sum of the momentum of all jets (\HT) and the transverse mass variable (\MTTWO). A detailed explanation of the variable \MTTWO is presented in Section~\ref{sec:preselection}. Backgrounds are estimated using well-established data-driven methods. \\ \\ The diagrams corresponding to the Simplified Model Spectrum (SMS) used to interpret the results are shown in Fig.~\ref{fig:signalmodels}. The left diagram is referred to as T2tt(x,y) and represents direct stop-antistop pair production with "x" and "y" the top squark and $\tilde{\chi}^{0}_{1}$ masses respectively. If the gluino is within the reach if the LHC energy in Run 2, gluino-induced processes such as those in the right of Fig.~\ref{fig:signalmodels} would become relevant to the analysis. The right diagram of Fig.~\ref{fig:signalmodels} is called T1tttt(x,y) where "x" and "y" are the gluino and $\tilde{\chi}^{0}_{1}$ masses respectively.\\ \begin{figure}[!tbp] \centering \begin{minipage}[b]{0.4\textwidth} \includegraphics[width=\textwidth]{figure/T2tt_feynman.pdf} \end{minipage} \hfill \begin{minipage}[b]{0.4\textwidth} \includegraphics[width=\textwidth]{figure/T1tttt_feynman.pdf} \end{minipage} \caption{Signal models of interest in this search: (left) the stop pair production with the stop decaying into a top quark and a neutralino and (right) gluino pair production where each gluino decays into the on-shell stop and top quark. The stop decay in to the top and a neutralino. The SUSY simplified model topology shown at the left is referred to as T2tt, and that on the right model as a T1tttt. } \label{fig:signalmodels} \end{figure} \section{Trigger} \label{sec:trigger} In 13 TeV proton-proton collisions at the LHC, a wide range of physics processes are possible. We use triggers to accept events that are interesting to us. This is typically done by collecting data using triggers that accept events if thresholds on certain observable are met. In the trigger these thresholds are reduced to allow further refinement in the offline processing. \\\\ \begin{figure}[!tbp] \begin{center} \includegraphics[width=0.85\linewidth]{figure/trigger/trigeff_hiht}\\ \includegraphics[width=0.85\linewidth]{figure/trigger/trigeff_loht}\\ \caption{The trigger efficiency, denoted by the black point, as a function of the offline \MET for (top figure) \HT $>$ 1000 and (bottom figure) 300 $<$ \HT $<$ 1000. The vertical error bar indicates the statistical uncertainty of the trigger efficiency and the horizontal bar indicates a bin width. } \label{fig:trigger} \end{center} \end{figure} In SUSY searches, a large \MET criteria is imposed to suppress the backgrounds. This criteria is also motivated by the fact that SUSY events are expected to contain heavy and weakly interacting particles that escape detection in CMS. Also, all hadronic searches require a minimal amount of \HT as explained in Section~\ref{sec:preselection}. In this analysis we use the triggers that are some combination of both \MET and \HT . Two conditions must be met in order for triggers to be useful. First, it must be highly efficient in selecting events. Also, it must be possible to make an unbiased measurement of the trigger efficiency. \\\\ The probability for triggers to accept events (trigger efficiency) is measured in a sample of events collected by the single-electron trigger. To measure the search trigger efficiency, additional cuts to mimic the pre-selection defined in Section~\ref{sec:preselection}, are required. The efficiency of the search triggers is measured as a function of the offline \MET . In Fig.~\ref{fig:trigger}, it is seen that the \MET trigger efficiency has a non-trivial dependency on the offline \HT . The search trigger efficiencies are measured in the low \HT (300 $<$ \HT $<$ 1000) and high \HT (\HT $>$ 1000) region.\\ \section{Pre Selection} \label{sec:preselection} The search looks at multijet events, with b-jets decaying from top quarks, large \MET, and no leptons. Initially, a loose baseline selection is applied in \MET, \HT, the number of jets (\NJETS) and \NBJETS . This baseline selection preserves 2-20$\%$ of the signal events. \begin{itemize} \item Filters : As a part of essential data cleaning process at the analysis level, all events must pass all filters that remove detector and beam related noise. \item \NJETS$\geq 4$: The stop are produced in a pair and each stop decays to a top and the Lighest Supersymmetric Particle (LSP). In a hadronic decay of a top quark, there will be three jets in the final state. There will be six jets altogether. Not all the jets pass the selection cuts, therefore we require at least four jets. Jets are reconstructed with the Particle Flow (PF)~\cite{Beaudette:2014cea} technique and clustered with the anti-$k_\mathrm{T}$ algorithm~\cite{antikt} with a resolution parameter $R = $0.4. Every jet is required to have \PT$>$ 30 \GEV and $|\eta|<$ 2.4. In addition, they must pass the loose jet ID criteria for PF jets. The leading two jets are required to have \PT$>$50 \GEV . The high momentum and a central $\eta$ requirements are motivated by the results of various optimization studies carried out with simulated signal and a background samples. SUSY signal studies predict centrally produced jets with very high momentum. \item \MET $\ge$ 250 \GEV : We use the particle flow MET with jet energy corrections applied and the cut threshold is constrained by trigger efficiency requirements. As shown in Fig.~\ref{fig:trigger}, our trigger is very efficient when \MET exceeds 250 \GEV . \item \MTTWO $\ge$ 200 \GEV : This cut is mainly used to reduce SM background events with a low value of \MTTWO . It specially works well for \TTBAR events where the \MTTWO shows a kinematic edge around the top quark mass. The \MTTWO variable described in~\cite{Lester:1999tx}, is an extension of the transverse mass variable and is sensitive to pair production of heavy particles, each of which decays into an invisible particle such as a neutralino. From the selected jets in an event, the four-momenta of a three jet system ($p^{3-jets}$) and a remnant system $(p^{Rsys})$ along with \MET were used to calculate the \MTTWO assuming the invisible particles are massless. As an example, consider the T2tt channel $pp \rightarrow \tilde{t}\tilde{t}^{*} \rightarrow t\bar{t}\tilde{\chi}_{1}^{0}\tilde{\chi}_{1}^{0}$. Here \MTTWO is defined as, \begin{equation} \begin{array}{l} \displaystyle {M_{T2} \equiv \min_{\vec{q}_{T}^{\:(1)}+\vec{q}_{T}^{\:(2)} = \vec{p}_{T}} [\max\{m_{T}^2(\vec{p}_{T}^{\:t^{(1)}}, \vec{q}_{T}^{\:(1)}; m_{\chi_1^0}), m_{T}^2(\vec{p}_{T}^{\:t^{(2)}}, \vec{q}_{T}^{\:(2)}; m_{\chi_1^0})\}] } \end{array}, \label{eq:MT2} \end{equation} where the $m_{T}^2$ is the transverse mass, \begin{equation} \label{eq:MTdef} \begin{array}{l} \displaystyle{ m_{T}^2(\vec{p}_{T}^{\:t^{(1)}}, \vec{q}_{T}^{\:(1)}; m_{\chi_1^0}) \equiv m_{t^{(1)}}^{2} + m_{\chi_1^0}^2 + 2(E_{T}^{t^{(1)}}E_{T}^{(1)} - \vec{p}_{T}^{\:t^{(1)}} \cdot \vec{q}_{T}^{\:(1)}) } \end{array}. \end{equation} From Eq.~\ref{eq:MT2}, it is clear that \MTTWO is the minimization of the transverse mass provided that the sum of the transverse momenta of both $\tilde{\chi_{1}^{0}}$'s is equal to the missing transverse momentum of the event. In the T2tt model mentioned above, we replace quantities represented by superscript $(1)$ and $(2)$ with fully reconstructed top quark $(p^{3-jets})$ and partially reconstructed top quark ($p^{Rsys}$) quantities, respectively, if there is only one top in an event. In cases with two or more fully reconstructed top quarks, we replace $(2)$ with fully reconstructed top quark. In Eq.~\ref{eq:MT2}, \MET corresponds to $\vec{p_{T}}$ as we assume $\tilde{\chi_{1}^{0}}$'s are massless with respect to the stop quark. \\ \\ In summary, the calculation of \MTTWO starts with an assumption that there is at least one good hadronic top in an event. If there are more than two top candidates in an event, we iterate different combinations of top quarks to get the smallest possible value of \MTTWO . If there is only one top quark reconstructed by the algorithm then we reconstruct the other top quark from the remnant of the event using the b-tagged jet (or the highest \PT jet if no b-tagged jet is found) as a seed and the remnant system jet closest to the seed jet with an invariant mass between 50 GeV and the top quark mass of 175 GeV. In case no combination satisfies the invariant mass requirement, we use the seed jet as the only remnant of the other top quark and \MTTWO is calculated by the reconstructed top candidate, the remnant, and the \MET . \item \HT $\ge$ 300 \GEV, with $H_{T} = \sum_{\mathrm{jets}}p_{T}$ where the \PT is the magnitude. This requirement is also motivated by the fact that a SUSY signal study predicted very high momentum jets. All jets in the \HT calculation must meet the same jet selection criteria defined above. \item \NBJETS $\ge$ 1, with b-jets identified using the Combined Secondary Vertex(CSV)~\cite{Chatrchyan:2012jua} b-tagging algorithm (CSVM). \item Muon veto: Events with muon candidates satisfying \PT$>$ 10 \GEV and $|\eta| < $2.4 are removed from the selection. \item Electron veto: Electron candidates are selected using the Electron Physics Object Group (POG)-recommended "Cut Based VETO" selection and events are removed from the selection. Different cut criteria are applied to the barrel and endcap electromagnetic calorimeter region. They are required to have \PT$>$ 10 \GEV and $|\eta|<$ 2.5. \item Angular cut: In order to suppress the background from QCD which tend to produce back to back topologies, we form an angular quantity \DELTAPHI defined as the angle between the \MET vector and three leading jets in \PT . We require \DELTAPHI(\MET, $j_{1,2,3}) > $ 0.5, 0.5 and 0.3 for the first three leading jets. In QCD processes, the missing energy comes mainly from a under-measurement of the jet energy. So the missing energy vector in the QCD process is closer to the jet direction. Keeping a large angular separation between the jet vector and the missing energy vector help us to suppress the events from the QCD process. \item Isolated track veto: After applying the above mentioned cuts, the remaining background comes from \TTBAR, $W$+jets events with one $W\rightarrow l\nu$ decay where $l$ can be an electron, muon or tau decaying hadronically. To further suppress these backgrounds, we reject events that have one or more isolated tracks. The track isolation is calculated from the charged PF candidates consistent with the reconstructed primary vertex ($|dz(PV)|<0.1~\mathrm{cm}$). The requirements are different for muon, electron and charged hadron tracks. For both electron and muon tracks, the isolated track requirements are: \PT$>$ 5 \GEV, $|\eta|<$ 2.5 and relative isolation less than 0.2. For charged hadron tracks, the \PT requirement is raised to be at least 10 \GEV and the relative isolation value to be less than 0.1. To retain more signal, thus improving our signal-to-background event discrimination, events with one isolated track, as defined above, are only rejected if they satisfy \begin{equation} m_T(tk,E_{T}^{miss}) = \sqrt{2p_{T}^{tk}E_{T}^{miss}(1-\cos\Delta\phi} < 1 00\;\mathrm{GeV}, \label{eq:mt_isotk} \end{equation} where $p_{T}^{tk}$ is the transverse momentum of the track and $\Delta\phi$ is the azimuthal separation between the track and the \MET vector. \end{itemize} \begin{figure}[!tbp] \begin{center} \includegraphics[width=0.6\linewidth]{figure/datamc/DataMC_MET_model_NTops_baseline.pdf} \includegraphics[width=0.6\linewidth]{figure/datamc/DataMC_MET_model_NBJEts_baseline.pdf}\\ \caption{ Comparisons of various signal models with total SM backgrounds for search variables \NTOPS (top), \NBJETS (bottom). All simulated signals and background samples were scaled to data. The scale is included in the legend for the signal points. } \label{fig:compSBvars1} \end{center} \end{figure} Comparisons between all SM background samples combined were compared with the signal scaled to total data yields after the baseline cut in the bin of all search variables as shown in Figs.~\ref{fig:compSBvars1} and~\ref{fig:compSBvars2}. The backgrounds are represented by the color filled histogram stacked on each other. The black dot represent data points. The colored solid lines represent enhanced signal points. It is clearly seen in Figs.~\ref{fig:compSBvars1} and~\ref{fig:compSBvars2} that in the bins with a relatively large number of entries in all search variables that the signal are well differentiated from the backgrounds. \\ \begin{figure}[!tbp] \begin{center} \includegraphics[width=0.6\linewidth]{figure/datamc/DataMC_MET_model_MT2_baseline.pdf} \includegraphics[width=0.6\linewidth]{figure/datamc/DataMC_MET_model_met_baseline.pdf}\\ \caption{ Comparisons of various signal models with total SM backgrounds for search variables \MTTWO (top) and \MET (bottom) . All simulated signals and background samples were scaled to data.The scale is included in the legend for the signal points. } \label{fig:compSBvars2} \end{center} \end{figure} The events passing the baseline selection are classified into search regions defined in terms of \NTOPS, \NBJETS, \MET, \HT, and \MTTWO . All search variables were combined to make 84 exclusive search bins. Bins with \NTOPS $\geq 3$ and \NBJETS $\geq 3$ are more sensitive to the T1tttt channel more than the T2tt channel. This is because with more than two top quarks, we can not define \MTTWO clearly. During the construction of those bins, \MTTWO was replaced by \HT . A series of significance scans in the dimension of \HT, \MET , and \HT were performed to study the search bin optimization. All search bins are shown in Fig.~\ref{fig:SB84}.\\ \begin{figure}[!tbp] \begin{center} \includegraphics[width=0.30\linewidth]{figure/searchbins/poly_MT2_vs_met_0.pdf} \includegraphics[width=0.30\linewidth]{figure/searchbins/poly_MT2_vs_met_1.pdf} \includegraphics[width=0.30\linewidth]{figure/searchbins/poly_MT2_vs_met_2.pdf} \\ \includegraphics[width=0.30\linewidth]{figure/searchbins/poly_MT2_vs_met_3.pdf} \includegraphics[width=0.30\linewidth]{figure/searchbins/poly_MT2_vs_met_4.pdf} \includegraphics[width=0.30\linewidth]{figure/searchbins/poly_MT2_vs_met_5.pdf} \\ \includegraphics[width=0.30\linewidth]{figure/searchbins/poly_MT2_vs_met_6.pdf} \includegraphics[width=0.30\linewidth]{figure/searchbins/poly_MT2_vs_met_7.pdf} \includegraphics[width=0.30\linewidth]{figure/searchbins/poly_MT2_vs_met_8.pdf} \\ \caption{All non-overlapping 84 search bins after baseline selections. Regions with \NBJETS $\leq$ 2 and \NTOPS $\leq$ 2, we use \NBJETS , \NTOPS , \MET , and \MTTWO as the binned search variables. Regions with \NBJETS $\geq$ 3 and \NTOPS $\geq$ 3, we use \NBJETS , \NTOPS , \MET , and \HT . The reason \HT is used for these latter regions, and not \MTTWO , is that in events with many jets, the jets from the decay of a particular heavy object may not always be correctly associated with that object, causing the distribution of \MTTWO to be broad and relatively flat. We find that \HT provides better discrimination between signal and background for \NBJETS $\geq$ 3 and \NTOPS $\geq$ 3 . The boundaries between the regions were determined through sensitivity studies. } \label{fig:SB84} \end{center} \end{figure} \section{Top Quark Reconstruction and Identification} \label{sec:toptagging} Depending upon the signal model, there are two or more top quarks produced in the final states. These top quarks can be produced with a wide range of transverse momentum \PT . If the mass of the neutralino is small, the top quark will receive a significant boost while a high mass neutralino will produce top quarks with low boost. Moreover, large numbers of particles sharing the initial energy results in a wide \PT -spectrum. So a good top tagging algorithm must be efficient in identifying a top quark over a large range of \PT and capable of tagging more than one top in a single event. \\\\ If the top quark is highly boosted, all the decay products of the top quarks end up in the localized area of the detector and are reconstructed as "fat-jets" or "ak8jets" with relatively large cone parameter $(\Delta R \sim 0.8)$. This happens if the top quark has \PT $>400$ GeV. In this scenario, the top quarks are reconstructed with a very high efficiency and a low fake rate.\\\\ When the top quark is lightly boosted, all top quark decay products are identified as individual jets $(\text{ak4jets with } \Delta R \sim 0.4 $). In this scenario, we have to construct top quarks from individual jets. Reconstruction of the top in this scenario is a bit more complicated as many jets from the different processes can mimic the top decay. So to reduce backgrounds while keeping the fake rate low and maintaining a high efficiency requires robust techniques and multivariate analysis. To create a top quark tagging algorithm that is able to function over the full range of top \PT , an algorithm combining both highly boosted and lightly boosted scenario is desired.\\ \subsection{Resolved Top Tagging} \label{ssec:resolvett} Most of the existing top tagging algorithms in high energy physics follow a cut based approach. In this approach, we put various kinematic restrictions on variables and select events that pass these restrictions for further study. For example, while identifying the $Z$ boson in an event, one limits the invariant mass of potential $Z$ boson candidates to 50 to 120 GeV and removes all candidates that fall outside of the mass window. Such a method in our case of top quark reconstruction is capable of giving enriched signal regions but comes with an undesirable fake rate. One way to reduce the fake rate is to use many cut variables. But in the cut based approach, the strategy significantly reduces the efficiency of the analysis. An elegant solution in such a scenario is to replace the traditional cut based approach with a Multivariate Analysis (MVA). \\\\ \textbf{Decision Tree:} The decision tree is a classic and natural model of learning from the data. The goal is to create a model that predicts the value of a target variable based on several input variables. Suppose one is trying to classify events into a signal and background and has data, or as in our case simulated data samples for each. The first step towards the classification is to split the data into two parts. The first part is called the training sample and second part is called the test sample. The training sample is used to train the model and the test sample will be use to test how accurately the model works on completely new data. Assuming there are several particle identification (PID) variables that help to distinguish signal from background, we first assign a tree with very basic tasks. Given the particular value of certain variables, we ask for the probability of a particular variable to correspond to signal or to background. Then we pick the boundary value that gives the best separation between signal and background. Then we turn to another variable and repeat the process until we use all variables. A sample decision tree is shown in Fig.~\ref{fig:demodt}.\\ \begin{figure}[!ht] \begin{center} \includegraphics[width=0.8\linewidth]{figure/toptag/ResolvedMVA/dt.png} \end{center} \caption{Schematic of a decision tree considering only three variables used. S stands for signal, B for background. Terminal nodes (called leaves) are shown in boxes. If signal events are dominant in one leaf, then this leaf is a signal leaf; otherwise it is a background leaf~\cite{decision_tree}.} \label{fig:demodt} \end{figure} \textbf{ Random Forest Classifier:} Among the myriad of decision tree algorithms available, the most robust against overtraining without losing discrimination power are ensemble algorithms that use a large number of decision trees to create a single discriminator. A number of different algorithms were tested including gradient boost, AdaBoost, random forest, extreme gradient boost, and basic decision trees. During preliminary studies, the gradient boost and the random forest proved to be the strongest options resulting in equivalent performance. The random forest decision tree algorithm~\cite{Ali_randomforests} was selected because it allows an ensemble of trees that can separate signal from background effectively without being overtrained on the training data. A random forest decision tree is an ensemble of decision trees that are each independently trained on a different subset of the available training data. Additionally, at each branch point in the individual decision trees, the algorithm only considers a subset of the total number of sample features (input variables) when deciding which cut is best. The output of the random forest is the mode of the individual trees for a classifier or the mean of the individual trees for a regression. \\\\ In the top tagging algorithm, the jets in the event are first clustered into top candidates by clustering all combinations of three AK4 jets that have \PT $>$ 30 \GEV and $|\eta| < 5$. Again the Jets are required to fall into a cone of \DELTAR $<$ 1.5 and have a combined mass between 110 and 250 \GEV in order to encompass the top mass window. The top candidate is allowed to have no more than one b-tagged jet among its three jets. This explicit requirement reduces the fake rate in high b-jet multiplicity events with only a small cost in efficiency. To avoid more than two top candidates sharing the same AK4 jet, the top candidate with the larger MVA discriminator value is allowed to keep the shared AK4 jet. \\ \begin{figure}[!ht] \begin{center} \includegraphics[width=0.8\linewidth]{figure/toptag/ResolvedMVA/rocdepth.png} \end{center} \caption{Event wise ROC curves for different training depths in random forest classifier. The efficiency was measured in \PT bin of generator level top quark in \TTBAR sample and the fake rate was measured in $Z\rightarrow\nu\nu$ samples. The efficiency and fake rate are defined in Eq.~\ref{eq:ROCEQ}} \label{fig:rocMVAevent} \end{figure} The Python-based package, Scikit-learn~\cite{scikit-learn}, which implements a wide variety of machine learning algorithms, was considered for training of the model. Some preliminary training was performed with this package. But for better integration with our CMSSW workflow, a C++ based package, OpenCV 3.1.0~\cite{opencv_library} was used. Also, the OpenCV package comes with a python binding that allows us to match with python-based preliminary work. \\ \begin{figure}[!ht] \begin{center} \includegraphics[width=0.8\linewidth]{figure/toptag/ResolvedMVA/D10v14ROC.pdf} \end{center} \caption{Object wise ROC curves for different training depths in random forest classifier. The true positive rate (TPR) and false positive rate (FPR) are defined in Eq.~\ref{eq:ROCEQ}. } \label{fig:rocMVAobject} \end{figure} Training data were prepared with 100000 events from single lepton \TTBAR samples and 70000 events with $Z\rightarrow \nu\nu$ samples. The top candidate criteria mentioned above were slightly loosened for the training samples. The top quark candidates are considered signal if each of the three jets in it is matched to a generator-level quark originating from a single top quark within a cone of \DELTAR $<$ 0.4. Additionally, the total Lorentz vector of the top candidate must be matched to a generator-level top quark within a cone of \DELTAR $<$ 0.6. These criteria ensure that the top quark candidate contains an accurate reconstruction of the generator level top quark. The top candidate that fails to satisfy the above criteria are marked as a background. In the case of multiple candidates passing signal or background tests (which often happens), they are all included in the training. \\ \\ Four quantities were defined to gauge the performance of the tagger. The Receiver Operating Characteristic (ROC) curve is defined in two ways, event-wise and object-wise. The event-wise ROC is the ratio of the efficiency over the fake rate whereas the object-wise ROC is defined as the ratio of the true positive rate (TPR) over the false positive rate (FPR). These quantities are defined as follows:\\ \begin{equation} \label{eq:ROCEQ} \begin{array}{l} \displaystyle{ \text{Efficiency} = \dfrac{\text{Number of generator-level top quarks matched to a tagged top}}{\text{Number of generator-level top quarks in an event}}}, \\\\ \displaystyle{ \text{Fake rate} = \dfrac{\text{Number of events having at least one tagged top}}{\text{Total number of events}}}, \\\\ \displaystyle{ \text{TPR} = \dfrac{\text{Number of tagged top candidates matched to a generator-level top quark}}{\text{Number of top candidates matched to a generator-level top quark}}},\\\\ \displaystyle{ \text{FPR} = \dfrac{\text{Number of tagged top candidates not matched to a generator-level top quark}}{\text{Number of top candidates not matched to a generator-level top quark}}}.\\ \\ \end{array} \end{equation} \begin{figure}[!ht] \begin{center} \includegraphics[width=0.6\linewidth]{figure/toptag/ResolvedMVA/Jet23Mass.png}\\ \includegraphics[width=0.6\linewidth]{figure/toptag/ResolvedMVA/Jet12dTheta.png} \end{center} \caption{Distributions of two input variables in the case of training signal and background. Red (blue) curves indicate signal (background). Top shows the mass of dijet formed of 2nd and 3rd leading $p$ (momentum) jets where $p$ is measured in the rest frame of the trijet. Bottom shows the angular separation between leading and 2nd leading jet in momentum. } \label{fig:disckeyvar} \end{figure} Here, "tagged top" indicates the trijet combination (top candidate) with a MVA discriminator value above a given threshold. The fake rate is measured from simulated $Z\rightarrow \nu\nu + \text{jets}$ events. \\\\ The tree depth defines the maximum number of branch nodes in any branch of a particular tree in the random forest. An extensive study was carried out to tune the correct tree depth. The tree depth was smoothly varied to find the balance between the optimized result and computational load management. A depth of 14 was chosen for training the nominal working points to maximize the signal versus background separation. If the depth is chosen too small, the signal distribution for the discriminator will be highly compressed against one end of the spectrum, while if the depth is chosen too large, the risk of overtraining is increased. the ROC curves are shown in Fig.~\ref{fig:rocMVAevent} and ~\ref{fig:rocMVAobject}. \\ A different set of variables were used to check the discriminating power of variables. The most powerful variables were top candidate properties such as mass, \PT , and \DELTAR , constituents jet properties such as \PT , mass, $\eta$ , $\phi$ and angular variables like \DELTAR , \DELTAPHI $\Delta\eta$ etc. Discrimination performed by a few key variables is shown in Fig.~\ref{fig:disckeyvar}.\\ \begin{figure}[!ht] \begin{center} \includegraphics[width=0.8\linewidth]{figure/toptag/ResolvedMVA/MVAdiscWPtight.png} \end{center} \caption{The discriminator for signal and background from the final trained random forest separated between signal and background-like candidates. A discriminator cut of 0.85 was used for the final analysis.} \label{fig:discriminator} \end{figure} The efficiency of the tagger is estimated using the full semi-leptonic \TTBAR MC sample. The fake rate is measured using the $Z\rightarrow\nu\nu$ sample where there are no hadronic decays of the top quark. Any tagged top in the sample where are no real top quarks will be counted as fake. Both the efficiency and fake rate were compared to that of previously adopted cut base tagger~\cite{Khachatryan:2015wza}. The efficiency of the MVA-based resolved tagging algorithm is somewhat lower than that of the cut-based tagger; however, the fake rate is much reduced. Comparisons between MVA based resolved tagger and cut based tagger for efficiency and the fake rate are shown is Fig.~\ref{fig:efffkrate}.\\ \begin{figure}[!ht] \begin{center} \includegraphics[width=0.45\linewidth]{figure/toptag/ResolvedMVA/baseline_eff_pt_ttbar1l.png} \includegraphics[width=0.45\linewidth]{figure/toptag/ResolvedMVA/baseline_fakerate_met.png} \end{center} \caption{Overall efficiency as measured in generator level top quark \PT bin of \TTBAR sample for both a MVA based tagger and an old tagger (left) and fake rate as measured in \MET bin of $Z\rightarrow\nu\nu$ sample (right). } \label{fig:efffkrate} \end{figure} \subsection{Merged Top Tagging} \label{ssec:mergedtt} As the top quark \PT increases, decay products get closer together. Thus for high \PT , hadronically decaying top quarks, one can no longer distinguish three jets. Instead, we reconstruct the decay products of the top quark as two jets (medium boost) or one jet (high boost). For ak4jets, the full top quark decay would be contained within the jet at a top quark \PT of around 700 GeV, which is quite high, and much larger than the point where the resolved top tagger becomes less effective. Therefore, it has become standard practice to use ak8jets to reconstruct boosted top quarks. Top quarks with a \PT above 400 GeV will have decay products contained within the ak8jet. In addition, we also consider the case where only the decay products of the $W$ boson are contained within one ak8jet, which happens for $W$ bosons with \PT $>$ 200 GeV. \\\\ As per suggestion by the JetMET working group within the CMS, we use three major requirements for boosted top quark tagging. ak8jets with Puppi (pileup suppressed) with \PT $>$ 400 \GEV to select boosted jets, soft drop mass~\cite{Larkoski:2014wba} between 105 \GEV and 210 \GEV to catch the top quark mass window, and $N$-subjettiness\footnote{$N$-subjettiness is a jet shape designed to identify boosted hadronic objects such as top quarks. Given $N$ subjet axes within a jet, $N$-subjettiness sums the angular distances of jet constituents to their nearest subjet axis.} $(\tau_{12})$ $<$ 0.65~\cite{Thaler:2010tr} to identify the number of subjets associated with each ak8jet. For the boosted $W$ boson, we require \PT $>$ 200 \GEV to select a slightly boosted $W$ boson, soft drop mass between 65 GeV and 100 \GEV to catch the $W$ boson mass window, and subjettiness $<$ 0.60 to make sure there are two jets associated with $W$ boson decay. \\\\ Since we are using two jet collections in the same event, we have to make sure that no two jets are sharing the same particle flow candidate. This might lead to overcounting top quarks and a faulty \MTTWO calculation. The basic approach taken here is to remove ak4jets that correspond to an ak8jet (that is tagged as a merged $W$ boson or top quark) from the list of jets used as input to the MVA-based resolved top tagger. The ak4jets removal process can be carried out in two ways: \DELTAR matching between ak4 and ak8jets, and \DELTAR matching between the ak4jets and the soft-drop subjets of the ak8jets. In either method, if \DELTAR is less than certain threshold, then the ak4jet is removed. All tagged $W$ bosons and top quarks that can be matched to a generator-level hadronically decaying $W$ boson or top quark within a cone of \DELTAR = 0.4 were used. Rigorous cross-validations were performed in different signal models with different mass points.\\\\ To check the performance of the subject matching scheme, we count the number of ak4jets that are matched (within a cone of \DELTAR = 0.4) to the subjects of a given tagged top quark.\\ \subsection{Combined Algorithm} \label{ssec:combinedtt} Individual taggers have a variety of adjustable parameters that can be tuned to manipulate the performance of algorithms. The most important parameters in merged algorithms were jet \PT and the mass window of top quark and $W$ boson. In the resolved tagger, we can adjust the discriminator to tune the efficiency and fake rate. The working point presented in this analysis is the “tight” working point which keeps a balance between efficiency and fake rate.\\\\ Performance of the combined tagger is very good overall. The efficiency was calculated using the \TTBAR semileptonic sample and the fake rate by $Z\rightarrow \nu\nu$ sample. The overall efficiency as compared with the cut based tagger is shown in Fig.~\ref{fig:overalleff} and the fake rate in bins of the search variables are shown in Fig.~\ref{fig:overallfkrate}. There is overall a similar efficiency but the fake rate is reduced drastically. \\ \begin{figure}[!ht] \begin{center} \includegraphics[width=0.8\linewidth]{figure/toptag/WorkingPoints/baseline_eff_pt_ttbar1l_tight.pdf} \caption{Overall efficiency of the combined tagger with a cut based tagger. The red lines are from the result immediately before the combined algorithm was developed. There is no significant loss of efficiency in low top \PT region. There is slight drop in efficiency in high \PT region but the signal is expected to be very small. } \label{fig:overalleff} \end{center} \end{figure} \begin{figure}[!ht] \begin{center} \includegraphics[width=0.85\linewidth]{figure/toptag/WorkingPoints/baseline_fakerate_met.png} \caption{Fake rate as measured for search variables \MET . } \label{fig:overallfkrate} \end{center} \end{figure} With a large phase space of signals to cover, the SUSY signals used for the analysis are simulated by the CMS fast simulation (fastsim) package. The CMS fast simulation provides an alternative to the GEANT-based approach (aka fullsim) and requires much less CPU time. Previous studies have been carried out with the fullsim sample. For the SUSY analysis, the performance of top tagging needs to be validated in the fastsim sample as well. As seen in Fig.~\ref{fig:fastfullsim}, the top tagging efficiency is generally shown to be in good agreement between the fullsim and fastsim samples. Efficiency in data was calculated from single muon collider data triggered with a single muon with \PT greater than 45 GeV and $|\eta| < $ 2.1, corresponding to an integrated luminosity of 24.8 $\text{fb}^{-1}$ and calculated using relation, \begin{equation} \epsilon = \frac{\textrm{best top candidate}}{\textrm{best top candidate + best jet combination}} .\\ \label{eq:dataTagSF} \end{equation} \begin{figure}[!ht] \begin{center} \includegraphics[width=0.8\linewidth]{figure/toptag/Systematics/T2tt_650_350_Tagger0.pdf} \caption{Fastsim vs Fullsim comparision and scalar factor of one of the signal point T2tt(stop mass 650 GeV , mLSP 350 GeV). The error bar ( mostly smaller than the data symbol) depicts the statistical uncertainty. The ratio of these efficiencies is taken to be the scale factor for the fastsim signal samples.} \label{fig:fastfullsim} \end{center} \end{figure} \section{Monte Carlo Samples for Background and Signal Studies} The analysis uses a set of Monte Carlo samples for background estimation method and interpretation of the results with respect to the number of simplified models used in the analysis. All background samples are generated with the Geant4-based CMS simulation application while all signal samples used to set limits are generated using the Fast Simulation application. Monte Carlo samples of SM processes reconstructed with CMSSW release 8.0 (Spring16) are used throughout this analysis.